* Use Unix socket for Supervisor to Core communication
Reintroduce Unix socket support for Supervisor-to-Core communication
(reverted in #6735) with the addition of a feature flag gate. The
feature is now controlled by the `core_unix_socket` feature flag and
disabled by default.
When enabled and Core version supports it, Supervisor communicates with
Core via a Unix socket at /run/os/core.sock instead of TCP. This
eliminates the need for access token authentication on the socket path,
as Core authenticates the peer by the socket connection itself.
Key changes:
- Add FeatureFlag.CORE_UNIX_SOCKET to gate the feature
- HomeAssistantAPI: transport-aware session/url/websocket management
- WSClient: separate connect() (Unix, no auth) and connect_with_auth()
(TCP) class methods with proper error handling
- APIProxy delegates websocket setup to api.connect_websocket()
- Container state tracking for Unix session lifecycle
- CI builder mounts /run/supervisor for integration tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Sort feature flags alphabetically
* Drop per-call max_msg_size from WSClient
Hardcode the WebSocket message size cap to 64 MB in WSClient and remove
the parameter from WSClient.connect, connect_with_auth, _ws_connect,
and HomeAssistantAPI.connect_websocket. This was only ever overridden
by APIProxy, so threading it through four layers was unnecessary.
max_msg_size is a cap, not a pre-allocation; aiohttp only grows buffers
to the size of actual incoming messages. Supervisor's own control
channel never approaches 64 MB, so unifying the limit has no runtime
cost.
Addresses review feedback on #6742.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Use Unix socket for Supervisor to Core communication
Switch internal Supervisor-to-Core HTTP and WebSocket communication
from TCP (port 8123) to a Unix domain socket.
The existing /run/supervisor directory on the host (already mounted
at /run/os inside the Supervisor container) is bind-mounted into the
Core container at /run/supervisor. Core receives the socket path via
the SUPERVISOR_CORE_API_SOCKET environment variable, creates the
socket there, and Supervisor connects to it via aiohttp.UnixConnector
at /run/os/core.sock.
Since the Unix socket is only reachable by processes on the same host,
requests arriving over it are implicitly trusted and authenticated as
the existing Supervisor system user. This removes the token round-trip
where Supervisor had to obtain and send Bearer tokens on every Core
API call. WebSocket connections are likewise authenticated implicitly,
skipping the auth_required/auth handshake.
Key design decisions:
- Version-gated by CORE_UNIX_SOCKET_MIN_VERSION so older Core
versions transparently continue using TCP with token auth
- LANDINGPAGE is explicitly excluded (not a CalVer version)
- Hard-fails with a clear error if the socket file is unexpectedly
missing when Unix socket communication is expected
- WSClient.connect() for Unix socket (no auth) and
WSClient.connect_with_auth() for TCP (token auth) separate the
two connection modes cleanly
- Token refresh always uses the TCP websession since it is inherently
a TCP/Bearer-auth operation
- Logs which transport (Unix socket vs TCP) is being used on first
request
Closes#6626
Related Core PR: home-assistant/core#163907
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Close WebSocket on handshake failure and validate auth_required
Ensure the underlying WebSocket connection is closed before raising
when the handshake produces an unexpected message. Also validate that
the first TCP message is auth_required before sending credentials.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix pylint protected-access warnings in tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Check running container env before using Unix socket
Split use_unix_socket into two properties to handle the Supervisor
upgrade transition where Core is still running with a container
started by the old Supervisor (without SUPERVISOR_CORE_API_SOCKET):
- supports_unix_socket: version check only, used when creating the
Core container to decide whether to set the env var
- use_unix_socket: version check + running container env check, used
for communication decisions
This ensures TCP fallback during the upgrade transition while still
hard-failing if the socket is missing after Supervisor configured
Core to use it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Improve Core API communication logging and error handling
- Remove transport log from make_request that logged before Core
container was attached, causing misleading connection logs
- Log "Connected to Core via ..." once on first successful API response
in get_api_state, when the transport is actually known
- Remove explicit socket existence check from session property, let
aiohttp UnixConnector produce natural connection errors during
Core startup (same as TCP connection refused)
- Add validation in get_core_state matching get_config pattern
- Restore make_request docstring
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Guard Core API requests with container running check
Add is_running() check to make_request and connect_websocket so no
HTTP or WebSocket connection is attempted when the Core container is
not running. This avoids misleading connection attempts during
Supervisor startup before Core is ready.
Also make use_unix_socket raise if container metadata is not available
instead of silently falling back to TCP. This is a defensive check
since is_running() guards should prevent reaching this state.
Add attached property to DockerInterface to expose whether container
metadata has been loaded.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Reset Core API connection state on container stop
Listen for Core container STOPPED/FAILED events to reset the
connection state: clear the _core_connected flag so the transport
is logged again on next successful connection, and close any stale
Unix socket session.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Only mount /run/supervisor if we use it
* Fix pytest errors
* Remove redundant is_running check from ingress panel update
The is_running() guard in update_hass_panel is now redundant since
make_request checks is_running() internally. Also mock is_running
in the websession test fixture since tests using it need make_request
to proceed past the container running check.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Bind mount /run/supervisor to Supervisor /run/os
Home Assistant OS (as well as the Supervised run scripts) bind mount
/run/supervisor to /run/os in Supervisor. Since we reuse this location
for the communication socket between Supervisor and Core, we need to
also bind mount /run/supervisor to Supervisor /run/os in CI.
* Wrap WebSocket handshake errors in HomeAssistantAPIError
Unexpected exceptions during the WebSocket handshake (KeyError,
ValueError, TypeError from malformed messages) are now wrapped in
HomeAssistantAPIError inside WSClient.connect/connect_with_auth.
This means callers only need to catch HomeAssistantAPIError.
Remove the now-unnecessary except (RuntimeError, ValueError,
TypeError) from proxy _websocket_client and add a proper error
message to the APIError per review feedback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Narrow WebSocket handshake exception handling
Replace broad `except Exception` with specific exception types that
can actually occur during the WebSocket handshake: KeyError (missing
dict keys), ValueError (bad JSON), TypeError (non-text WS message),
aiohttp.ClientError (connection errors), and TimeoutError. This
avoids silently wrapping programming errors into HomeAssistantAPIError.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Remove unused create_mountpoint from MountBindOptions
The field was added but never used. The /run/supervisor host path
is guaranteed to exist since HAOS creates it for the Supervisor
container mount, so auto-creating the mountpoint is unnecessary.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Clear stale access token before raising on final retry
Move token clear before the attempt check in connect_websocket so
the stale token is always discarded, even when raising on the final
attempt. Without this, the next call would reuse the cached bad token
via _ensure_access_token's fast path, wasting a round-trip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add tests for Unix socket communication and Core API
Add tests for the new Unix socket communication path and improve
existing test coverage:
- Version-based supports_unix_socket and env-based use_unix_socket
- api_url/ws_url transport selection
- Connection lifecycle: connected log after restart, ignoring
unrelated container events
- get_api_state/check_api_state parameterized across versions,
responses, and error cases
- make_request is_running guard and TCP flow with real token fetch
- connect_websocket for both Unix and TCP (with token verification)
- WSClient.connect/connect_with_auth handshake success, errors,
cleanup on failure, and close with pending futures
Consolidate existing tests into parameterized form and drop synthetic
tests that covered very little.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix version/requirements parsing in setup.py, set version in Dockerfile
Remove the step patching const.py with detected version in CI and do
this during Dockerfile version from BUILD_VERSION argument instead.
Also, fix parsing of the version in `setup.py` - it wasn't working
because it was splitting the files on "/n" instead of newlines,
resulting in all Python packages installed having version string set to
`9999.9.9.dev9999` instead of the expected version.
(Note: setuptools are doing a version string normalization, so the
installed package has stripped leading zeroes and the second component
is `9` instead of the literal `09` used in default string in all places.
Fixing this is out of scope of this change as the ideal solution would
be to change the versioning schema but it should be noted.)
Lastly, clean up builder.yaml environment variables (crane is not used
anymore after #6679).
* Generate setuptools-compatible version for PR builds
For PR builds, we're using plain commit SHA as the version. This is
perfectly fine for Docker tags but it's not acceptable for Python
package version. By fixing the setup.py bug this error surfaced. Work it
around by setting the Package version to the string we used previously
suffixed by the commit SHA as build metadata (delimited by `+`).
When only the workflow changes, CI doesn't trigger rebuild when the PR
is merged to the `main` branch. Since it's a legitimate build trigger,
add it to the paths list.
The manifest step was failing because the image name wasn't set in the
env. Also we can standardize the workflow by using the shared matrix
prepare step.
Publish multi-arch manifest (`hassio-supervisor` image) after the build
finishes. The step's `needs` are intentionally not matching `version`
step, as the build itself publishes the arch-prefixed image already, so
only build is needed for the manifest to be published as well.
Closes#6646
In #6347 we dropped the build for deprecated architectures and started
re-tagging of Supervisor 2025.11.5 images to make them available through
the tag of the latest version. This was to provide some interim period
of graceful handling of updates for devices which were not online at the
time when Supervisor dropped support for these architectures. As the
support was dropped almost 4 months ago already, the majority of users
should have hopefully migrated. The rest will now see Supervisor failing
to update with no message about architecture drop if they update from a
too old version. Since the re-tagged version also reported a failure to
update, the impact isn't so bad.
* Migrate builder workflow to new builder actions
Migrate Supervisor image build to new builder actions. The resulting images
should be identical to those built by the builder.
Refs #6646 - does not implement multi-arch manifest publishing (will be done in
a follow-up)
* Update devcontainer version to 3
* Use Python 3.14(.3) in CI and base image
Update base image to the latest tag using Python 3.14.3 and update Python
version in CI workflows to 3.14.
With Python 3.14, backports.zstd is no longer necessary as it's now available
in the standard library.
* Update wheels ABI in the wheels builder to cp314
* Use explicit Python fix version in GH actions
Specify explicitly Python 3.14.3, as the setup-python action otherwise default
to 3.14.2 when 3.14.3, leading to different version in CI and in production.
* Update Python version references in pyproject.toml
* Fix all ruff quoted-annotation (UP037) errors
* Revert unquoting of DBus types in tests and ignore UP037 where needed
* Fix getting Supervisor IP address in testing
Newer Docker versions (probably newer than 29.x) do not have a global
IPAddress attribute under .NetworkSettings anymore. There is a network
specific map under Networks. For our case the hassio has the relevant
IP address. This network specific maps already existed before, hence
the new inspect format works for old as well as new Docker versions.
While at it, also adjust the test fixture.
* Actively wait for hassio IPAddress to become valid
* Raise HomeAssistantWSError when Core WebSocket is unreachable
Previously, async_send_command silently returned None when Home Assistant
Core was not reachable, leading to misleading error messages downstream
(e.g. "returned invalid response of None instead of a list of users").
Refactor _can_send to _ensure_connected which now raises
HomeAssistantWSError on connection failures while still returning False
for silent-skip cases (shutdown, unsupported version). async_send_message
catches the exception to preserve fire-and-forget behavior.
Update callers that don't handle HomeAssistantWSError: _hardware_events
and addon auto-update in tasks.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Simplify HomeAssistantWebSocket command/message distinction
The WebSocket layer had a confusing split between "messages" (fire-and-forget)
and "commands" (request/response) that didn't reflect Home Assistant Core's
architecture where everything is just a WS command.
- Remove dead WSClient.async_send_message (never called)
- Rename async_send_message → _async_send_command (private, fire-and-forget)
- Rename send_message → send_command (sync wrapper)
- Simplify _ensure_connected: drop message param, always raise on failure
- Simplify async_send_command: always raise on connection errors
- Remove MIN_VERSION gating (minimum supported Core is now 2024.2+)
- Remove begin_backup/end_backup version guards for Core < 2022.1.0
- Add debug logging for silently ignored connection errors
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Wait for Core to come up before backup
This is crucial since the WebSocket command to Core now fails with the
new error handling if Core is not running yet.
* Wait for Core install job instead
* Use CLI to fetch jobs instead of Supervisor API
The Supervisor API needs authentication token, which we have not
available at this point in the workflow. Instead of fetching the token,
we can use the CLI, which is available in the container.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The CLI calls in the tests are still using deprecated add-ons terminology,
causing deprecation warnings. Change the commands and flags to the new ones.
The retrieve-changed-files action only supports pull_request and push
events. Restrict the "Get changed files" step to those event types so
manual workflow_dispatch runs no longer fail. Also always build wheels
on manual dispatches since there are no changed files to compare against.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Remove the automated frontend update workflow and version tracking file
as the frontend repository no longer builds supervisor-specific assets.
Frontend updates will now follow a different distribution mechanism.
Related to home-assistant/frontend#29132
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
* Improve Supervisor startup wait logic in CI workflow
The 'Wait for Supervisor to come up' step was failing intermittently when
the Supervisor API wasn't immediately available. The original script relied
on bash's lenient error handling in command substitution, which could fail
unpredictably.
Changes:
- Use curl -f flag to properly handle HTTP errors
- Use jq -e for robust JSON validation and exit code handling
- Add explicit 5-minute timeout with elapsed time tracking
- Reduce log noise by only reporting progress every 15 seconds
- Add comprehensive error diagnostics on timeout:
* Show last API response received
* Dump last 50 lines of Supervisor logs
- Show startup time on success for performance monitoring
This makes the CI workflow more reliable and easier to debug when the
Supervisor fails to start.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
* Use YAML anchor to deduplicate wait step in CI workflow
The 'Wait for Supervisor to come up' step appears twice in the
run_supervisor job - once after starting and once after restarting.
Use a YAML anchor to define the step once and reference it on the
second occurrence.
This reduces duplication by 28 lines and makes future maintenance
easier by ensuring both wait steps remain identical.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>