* Add resolution check for duplicate OS installations
* Only create single issue/use separate unhealthy type
* Check MBR partition UUIDs as well
* Use partlabel
* Use generator to avoid code duplication
* Add list of devices, avoid unnecessary exception handling
* Run check only on HAOS
* Fix message formatting
* Fix and simplify pytests
* Fix UnhealthyReason sort order
* Check for duplicate data disks only when the OS is available
Supervised installations do not have a specific data disk, so only
check for duplicate data disks on Home Assistant OS.
* Enable OS for multiple data disks check test
* Drop ensure_builtin_repositories
With the new Repository classes we have the is_builtin property, so we
can easily make sure that built-ins are not removed. This allows us to
further cleanup the code by removing the ensure_builtin_repositories
function and the ALL_BUILTIN_REPOSITORIES constant.
* Make sure we add built-ins on load
* Reuse default set and avoid unnecessary copy
Reuse default set and avoid unnecessary copying during validation if
the default is not being used.
* Add Supervisor connectivity check after DNS restart
When the DNS plug-in got restarted, check Supervisor connectivity
in case the DNS plug-in configuration change influenced Supervisor
connectivity. This is helpful when a DHCP server gets started after
Home Assistant is up. In that case the network provided DNS server
(local DNS server) becomes available after the DNS plug-in restart.
Without this change, the Supervisor connectivity will remain false
until the a Job triggers a connectivity check, for example the
periodic update check (which causes a updater and store reload) by
Core.
* Fix pytest and add coverage for new functionality
* Rename repository fixture to test_repository
Also don't remove the built-in repositories. The list was incomplete,
and tests don't seem to require that anymore.
* Get rid of StoreType
The type doesn't have much value, we have constant strings anyways.
* Introduce types.py
* Use slug to determine which repository urls to return
* Simplify BuiltinRepository enum
* Mock GitRepo load
* Improve URL handling and repository creation logic
* Refactor update_repositories
* Get rid of get_from_url
It is no longer used in production code.
* More refactoring
* Address pylint
* Introduce is_git_based property to Repository class
Return all git based URLs, including the Core repository.
* Revert "Introduce is_git_based property to Repository class"
This reverts commit dfd5ad79bf23e0e127fc45d97d6f8de0e796faa0.
* Fold type.py into const.py
Align more with how Supervisor code is typically structured.
* Update supervisor/store/__init__.py
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Apply repository remove suggestion
* Fix tests
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
When authentication using JSON payload or URL encoded payload fails,
use the generic HTTP response code 401 Unauthorized instead of 400
Bad Request.
This is a more appropriate response code for authentication errors
and is consistent with the behavior of other authentication methods.
* Improve DNS plug-in restart
Instead of simply go by PrimaryConnectioon change, use the DnsManager
Configuration property. This property is ultimately used to write the
DNS plug-in configuration, so it is really the relevant information
we pass on to the plug-in.
* Check for changes and restart DNS plugin
* Check for changes in plug-in DNS
Cache last local (NetworkManager) provided DNS servers. Check against
this DNS server list when deciding when to restart the DNS plug-in.
* Check connectivity unthrottled in certain situations
* Fix pytest
* Fix pytest
* Improve test coverage for DNS plugins restart functionality
* Apply suggestions from code review
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Debounce local DNS changes and event based connectivity checks
* Remove connection check logic
* Remove unthrottled connectivity check
* Fix delayed call
* Store restart task and cancel in case a restart is running
* Improve DNS configuration change tests
* Remove stale code
* Improve DNS plug-in tests, less mocking
* Cover multiple private functions at once
Improve tests around notify_locals_changed() to cover multiple
functions at once.
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Add GitHub Copilot/Claude instruction
This adds an initial instruction file for GitHub Copilot and Claude
(CLAUDE.md symlinked to the same file).
* Add --ignore-missing-imports to mypy, add note to run pre-commit
* Use Docker BuildKit to build addons
* Improve error message as suggested by CodeRabbit
* Fix container.remove() tests missing v=True
* Ignore squash rather than falling back to legacy builder
* Use version rather than tag to avoid confusion in run_command()
* Fix tests differently
* Use PropertyMock like other tests
* Restore position of fix_label fn
* Exempt addon builder image from unsupported checks
* Refactor tests
* Fix tests expecting wrong builder image
* Remove harcoded paths
* Fix tests
* Remove get_addon_host_path() function
* Use docker buildx build rather than docker build
Co-authored-by: Stefan Agner <stefan@agner.ch>
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
The access token token_validation() code in the security middleware
potentially accesses the access token property before the Supervisor
starts the CLI/Observer plugins, which leads to an KeyError when
trying to access the `access_token` property. This change ensures
that no key error is raised, but just None is returned.
Feature requests are now collected using the org wide GitHub Community.
Update the link accordingly.
While at it, also remove the unused ISSUE_TEMPLATE.md and align the
title to create issues with what is used in Home Assistant Core's
template.
* Rename detect-blocking-io API value to match other APIs
For the new detect-blocking-io option, use dashes instead of
underscores in `on-at-startup` for consistency with other API
endpoints.
This is a breaking change, but since the API is really new and not
really used yet, it is fairly safe to do so.
* Fix pytest
* Fix mypy issues in store module
* Fix mypy issues in utils module
* Fix mypy issues in all remaining source files
* Fix ingress user typeddict
* Fixes from feedback
* Fix mypy issues after installing docker-types
Don't label new issues with the bug label by default. We started making
use of issue types, so if anything, this should be type "Bug". However,
we prefer to leave the type unspecified until the issue has been triaged.
Expose the unique machine ID of the local system via the Supervisor
API. This allows to identify a particular machine across reboots,
backup restores and updates. The machine ID is a stable identifier
that does not change unless the underlying hardware is changed or
the operating system is reinstalled.
* Fix mypy errors in misc and mounts
* Fix mypy issues in os module
* Fix typing of capture_exception
* avoid unnecessary property call
* Fixes from feedback
* Avoid aiodns resolver memory leak
In certain cases, the aiodns resolver can leak memory. This also
leads to Fatal `Python error… ffi.from_handle()`. This addresses
the issue by ensuring that the resolver is properly closed
when it is no longer needed.
* Address coderabbitai feedback
* Fix pytest
* Fix pytest
Configurable and w/ migrations between IPv4-Only and Dual-Stack
Signed-off-by: David Rapan <david@rapan.cz>
Co-authored-by: Stefan Agner <stefan@agner.ch>
This reverts commit 63fde3b4109310e95ebdcc8e3c23a04ff96ba592.
This change introduced another more severe regression, causing all
add-ons that haven't been started since Supervisor startup to cause
errors during their backup. More sophisticated check would have to be
implemented to address edge cases during backups for non-existing
add-ons (or their config actually).
Fixes#5924
* Avoid early DNS plug-in start
A connectivity check can potentially be triggered before the DNS
plug-in is loaded. Avoid calling restart on the DNS plug-in before
it got initially loaded. This prevents starting before attaching.
The attaching makes sure that the DNS plug-in container is recreated
before the DNS plug-in is initially started, which is e.g. needed
by a potentially hassio network configuration change (e.g. the
migration required to enable/disable IPv6 on the hassio network,
see #5879).
* Mock DNS plug-in running
Instead of starting a task in the background synchronously wait
for Supervisor start sequence to complete. This should be functional
equivalent, as we anyways would loop forever in the event loop just
afterwards.
The advantage is that we now can catch any exceptions during the
start sequence and report any errors with critical logging to report
those to Sentry, if enabled. It also avoids "Task exception was never
retrieved" errors. Reporting errors is especially important since we
can't use the asyncio Sentry integration (see #5729 for details).
Also handle early add-on start errors just like other add-on start
errors (make sure the finally block is executed as well). And finally,
register signal handlers synchronously. There is no real benefit in
doing them asynchronously, and it avoids a potential race condition.
* Use journal-gatewayd's new /boots endpoint to list boots
Current method we use for getting boots has several known downsides, for
example it can miss some incomplete boots and the performance might be
worse than what we could get by using Systemd directly. Systemd was
missing a method to get list boots through the journal-gatewayd but that
should be addressed by the new /boots endpoint added in [1] which
returns application/json-seq response containing all boots as reported
in `journalctl --list-boots`.
Implement Supervisor methods to parse this format and use the endpoint
at first, falling back to the old method if it fails.
[1] https://github.com/systemd/systemd/pull/37574
* Log info instead of warning when /boots is not present
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Split records only by RS instead of LF in journal_boots_reader
* Strip only RS, json.loads is fine with whitespace
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
When a backup error occurs, it might be that the backup file hasn't
been created yet, e.g. when there is no space or no permission on
the target backup directory. Deleting the backup file would fail
in this case. Use missing_ok instead to ignore a missing backup file
on delete.
With #5897 we renamed addon to addon_config and vis-versa. The log
messages were still using the previous variable names leading to
UnboundLocalError.
Fix the log messages to use the correct variable names.
To avoid accidential writes to the Supervisor root filesystem, we might
use the Docker read-only mode at one point. This is not yet the default,
but using s6-overlay with the read-only flag seems not to have any
downsides. So enable this by default.
To start Supervisor with read-only root file system teh following
arguments have to be used: `--read-only --tmpfs /run:exec`.
* Avoid initializing Blockbuster on Supervisor info call
Instead of creating an instance of Blockbuster to simply check if
Bluckbuster is enabled, use a global variable to store the instance
of Blockbuster and only initialize it when needed. This avoids
unnecessary initialization of Blockbuster when it is not required.
* Update supervisor/utils/blockbuster.py
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Fix merge and rename singleton class to BlockBusterManager
* Fix pytest
---------
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
Process NetworkManager interface updates in case PrimaryConnection
changes. This makes sure that the /network/interface/default/info
endpoint can be used to get the IP address of the primary interface.
* Use add-on config timestamp to determine add-on update age
Instead of using the current timestamp when loading the add-on config,
simply use the add-on config modification timestamp. This way, we can
get a timetsamp even when Supervisor got restarted. It also simplifies
the code a bit.
* Fix pytest
* Patch stat() instead of modifing fixture files
* Stop reading advanced logs on ConnectionError
If the client side connection closes with a `ConnectionError`, stop
reading the advanced logs.
This is very similar to ClientConnectionResetError which is easily
reproducable by having a log open and following in the browser and
then restaring Home Assistant. So far I wans't able to artificaially
reproduce the ConnectionError, but there are quite some reports on
Sentry so it seems to happen in real world.
* Warn on ConnectionError
* feat: Add IPv6 address generation mode & privacy extensions
Signed-off-by: David Rapan <david@rapan.cz>
* Use NetworkManager fixture for settings init tests
This fixes the test by since the extended implementation now can read
the version of NetworkManager.
* Add pytest for addr_gen_mode
---------
Signed-off-by: David Rapan <david@rapan.cz>
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Trigger auto-update through Core WebSocket call
Instead of auto-updating add-ons on Supervisor side trigger an update
through Core via a WebSocket command. This makes sure that the backup
is categorized correctly and all backup features like retention are
applied.
* Add pytest
* Fix pytest
* Fix pytest
* Fix pytest
* Fix pytest
* Fix pytest cleaner
* Set timestamp of add-on far into the past
Instead of copying the backup in the main job, lets copy them in
separate job per location. This allows to use the same backup error
handling mechanism as for add-ons and folders.
This makes the stage introduced in #5784 somewhat redundant, but
before removing it, let's see if this approach works out.
* Harmonize folder and add-on backup error handling
Align add-on and folder backup error handling in that in both cases
errors are recorded on the respective backup Jobs, but not raised to
the caller. This allows the backup to complete successfully even if
some add-ons or folders fail to back up.
Along with this, also record errors in the per-add-on and per-folder
backup jobs, as well as the add-on and folder root job.
And finally, align the exception handling to only catch expected
exceptions for add-ons too.
* Fix pytest
In case the c-ares based AsyncResolver fails to initialize, let's
fallback to the threaded resolver. This would have helped for aiodns
3.3.0 issue when clients were unable to allocate more inotify watches.
This is fixed in aiodns 3.4.0, but we should still fallback to the
threaded resolver as a precautionary measure.
* Handle non-existing addon config dir
Since users have access to the root of all add-on config directories,
they can delete the directory of an add-ons at any time. Hence we need
to handle gracefully if it doesn't exist anymore.
* Add pytest
When the systemd-journal-gatewayd service is being shutdown while
Supervisor is still trying to read logs, aiohttp throws a
ClientPayloadError, presumably because we try to read until the next
linefeed, which aiohttp cannot satisfy anymore.
Simply catch the exception just like the connection reset errors
previously in #5358 and #5715.
* Recreate aiohttp ClientSession after DNS plug-in load
Create a temporary ClientSession early in case we need to load version
information from the internet. This doesn't use the final DNS setup
and hence might fail to load in certain situations since we don't have
the fallback mechanims in place yet. But if the DNS container image
is present, we'll continue the setup and load the DNS plug-in. We then
can recreate the ClientSession such that it uses the DNS plug-in.
This works around an issue with aiodns, which today doesn't reload
`resolv.conf` automatically when it changes. This lead to Supervisor
using the initial `resolv.conf` as created by Docker. It meant that
we did not use the DNS plug-in (and its fallback capabilities) in
Supervisor. Also it meant that changes to the DNS setup at runtime
did not propagate to the aiohttp ClientSession (as observed in #5332).
* Mock aiohttp.ClientSession for all tests
Currently in several places pytest actually uses the aiohttp
ClientSession and reaches out to the internet. This is not ideal
for unit tests and should be avoided.
This creates several new fixtures to aid this effort: The `websession`
fixture simply returns a mocked aiohttp.ClientSession, which can be
used whenever a function is tested which needs the global websession.
A separate new fixture to mock the connectivity check named
`supervisor_internet` since this is often used through the Job
decorator which require INTERNET_SYSTEM.
And the `mock_update_data` uses the already existing update json
test data from the fixture directory instead of loading the data
from the internet.
* Log ClientSession nameserver information
When recreating the aiohttp ClientSession, log information what
nameservers exactly are going to be used.
* Refuse ClientSession initialization when API is available
Previous attempts to reinitialize the ClientSession have shown
use of the ClientSession after it was closed due to API requets
being handled in parallel to the reinitialization (see #5851).
Make sure this is not possible by refusing to reinitialize the
ClientSession when the API is available.
* Fix pytests
Also sure we don't create aiohttp ClientSession objects unnecessarily.
* Apply suggestions from code review
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
---------
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
Similar to #5825, make sure we mock the systemd journal gateway socket
for tests. This makes the test work on systems which have
systemd-journal-gatewayd installed.
* Check local store repository for changes
Instead of simply assume that the local store repository got changed,
use mtime to check if there have been any changes to the local store.
This mimics a similar behavior to the git repository store updates.
Before this change, we end up in the updated repo code path, which
caused a re-read of all add-ons on every store reload, even though
nothing changed at all. Store reloads are triggered by Home Assistant
Core every 5 minutes.
* Fix pytest failure
Now that we actually only reload metadata if the local store changed
we have to fake the change as well to fix the store manager tests.
* Fix path cache update test for local store repository
* Take root directory into account/add pytest
* Rename utils/__init__.py tests to test_utils_init.py
When uninstalling an add-on, we schedule a task to reload the ingress
tokens. This scheduled task typically ends up running right after
clearing the add-on data with `self.sys_addons.data.uninstall(self)`
(since this task is doing I/O, the race is rather deterministic).
Let's make sure we reload the ingress tokens at the end. Also simply
execute reloading synchrounsly since this is a rather quick operation
and makes sure that errors would get attributed to the right add-on
uninstall operation.
* Improve backup upload location determination
For local backup upload locations, check if the location is on the same
file system an thuse allows to move the backup file after upload. This
allows custom backup mounts. Currently there is no documented,
persistent way to create such mounts in with Home Assistant OS
installations, but since we might add local mounts in the future this
seems a worthwhile addition.
Fixes: #5837
* Fix pytests
So far a store reload lead to a reload of all add-ons twice, usually
causing two messages in quick succession:
```
2025-04-25 17:01:05.058 INFO (MainThread) [supervisor.store] Loading add-ons from store: 91 all - 0 new - 0 remove
2025-04-25 17:01:05.058 INFO (MainThread) [supervisor.store] Loading add-ons from store: 91 all - 0 new - 0 remove
```
This is because when repository changes are detected, `reload()` calls
`load()` which then calls `update_repositories()` which ends up calling
`_read_addons()`, while `reload()` itself calls `_read_addons()` after
`load()` as well.
One way to fix this would be to simply remove the `_read_addons()` call
in `reload()`.
However, it seems the `update_repositories()` call (via `load()`)
is not necessary at all, as we don't add new store repositories in
`reload()`, and we already made sure the built-ins are present on
startup.
So simply call `data.update()` to update the store data cache, as it
was the case before #2225. There is no apparent reason documented why
`data.update()` was changed to a `load()` call. It might be to ensure
regularly that built-in repositories are still in the list of store
repositories. But this type of regular invariant check is often harmful
as it might hide bugs in other places.
Supervisor will still call `update_repositories()` in `load()` to
ensure all built-in repositories are present, just in case the local
configuration file got modified or corrupted.
This reverts commit 1504278223a8d852d3b11de25f676fa8d6501a70.
It turns out that recreating the session can cause race conditions, e.g.
with API checks triggered by proxied requests running alongside. These
manifest in the following error:
AttributeError: 'NoneType' object has no attribute 'connect'
...
File "supervisor/homeassistant/api.py", line 187, in check_api_state
if state := await self.get_api_state():
File "supervisor/homeassistant/api.py", line 171, in get_api_state
data = await self.get_core_state()
File "supervisor/homeassistant/api.py", line 145, in get_core_state
return await self._get_json("api/core/state")
File "supervisor/homeassistant/api.py", line 132, in _get_json
async with self.make_request("get", path) as resp:
File "contextlib.py", line 214, in __aenter__
return await anext(self.gen)
File "supervisor/homeassistant/api.py", line 106, in make_request
async with getattr(self.sys_websession, method)(
File "aiohttp/client.py", line 1425, in __aenter__
self._resp: _RetType = await self._coro
File "aiohttp/client.py", line 703, in _request
conn = await self._connector.connect(
The only reason for the _connection in the aiohttp client to be None is
when close() gets called on the session. The only place this is being
done is the connectivity check.
So it seems that between fetching the session from the `sys_websession`
property) and actually using the connector, a connectivity check has been
run which then causes the above stack trace.
Let's not mess with the lifetime of the ClientSession object and simply
revert the change. Another solution for the original problem needs to be
found.
* Add basic test coverage for /auth API
* Check /auth API is called from an add-on
Currently the /auth API is only available for add-ons. Return 403
for calls not originating from an add-on.
* Handle bad json in auth API
Use the API specific JSON load helper which raises an APIError. This
causes the API to return a 400 error instead of a 500 error when the
JSON is invalid.
* Avoid redefining name 'mock_check_login'
* Update tests/api/test_auth.py
* Add dedicated update information reload
Currently we have the /refresh_updates endpoint which updates the main
component versions (Core, OS, Supervisor, Plug-ins) and the add-on
store at the same time. This combined update causes more update
information reloads than necessary.
To allow fine grained update refresh control introduce a new endpoint
/reload_updates which asks Supervisor to only update main component
versions (learned through the version json files).
The /store/reload endpoint already allows to update the add-on store
separately.
* Add pytest
* Update supervisor/api/__init__.py
* Unify Supervisor event message functions
Unify functions which send WebSocket messages of type
"supervisor/event". This deduplicates code and hopefully avoids further
diversication in the future.
While at it, remove unused HomeAssistantWSNotSupported exception. It
seems the only place this exception is used got removed in #3317.
* Test message delivery during shutdown states
Similar to timezone also add country information to the Supervisor
info. This is useful to set country specific configurations such as
Wireless radio regulatory setting. This is also useful for add-ons
which need country information but only have hassio API access.
Since Systemd v256 the Range header must not end with a trailing colon.
We relied on this undocumented feature when following logs, and the
frontend or CLI may still use it in requests. To fix the requests
failing with new Systemd version, intercept the header and fill in the
num_entries to maximum possible value, which avoids the journal-gatewayd
returning the response prematurely and also works on older Systemd
versions.
The journal-gatewayd would still return response if follow flag is used
along with num_entries, but this behavior is unchanged and would be
better fixed in the backend.
Link: https://github.com/systemd/systemd/issues/37172
The dependency typing-extensions has been added with #3848, but not
used much. With the update to Python 3.11 in #4666 the necessary types
are now available from the Python standard library, making
typing-extensions unused. Remove the now unnecessary typing-extensions
from the dependencies.
* Fix root path requests
Since #5759 we've tried to access the path explicitly. However, this
raises KeyError exception when trying to access the proxied root path
(e.g. http://supervisor/core/api/). Before #5759 get was used, which
lead to no exception, but instead inserted a `None` into the path.
It seems aiohttp doesn't provide a path when the root is accessed. So
simply convert this to no path as well by setting path to an empty
string.
* Add rudimentary pytest for regular proxy requets
* Fix mypy issues in backups module
* Fix mypy issues in dbus module
* Fix mypy issues in api after rebase
* TypedDict to dataclass and other small fixes
* Finish fixing mypy errors in dbus
* local_where must exist
* Fix references to name in tests
* Improve Home Assistant Core WebSocket proxy implementation
This change removes unnecessary task creation for every WebSocket
message and instead creates just two tasks, one for each direction.
This improves performance by about factor of 3 when measuring 1000
WebSocket requests to Core (from ~530ms to ~160ms).
While at it, also handle all WebSocket message related to closing the
WebSocket and report all other errors as warnings instead of just info.
* Improve logging and error handling
* Add WS client error test case
* Use asyncio.gather directly
* Use asyncio.wait to handle exceptions gracefully
* Drop cancellation handling and correctly wait for the other proxy task
* Add API for swap configuration
Add HTTP API for swap size and swappiness to /os/config/swap. Individual
options can be set in JSON and are calling the DBus API added in OS
Agent 1.7.x, available since OS 15.0. Check for presence of OS of the
required version and return 404 if the criteria are not met.
* Fix type hints and reboot_required logic
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Fix formatting after adding suggestions from GH
* Address @mdegat01 review comments
- Improve swap options validation
- Add swap to the 'all' property of dbus agent
- Use APINotFound with reason instead of HTTPNotFound
- Reorder API routes
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Report stage with error in jobs
* Copy doesn't lose track of the successful copies
* Add stage to errors in api output test
* revert unneessary change to import
* Add tests for a bit more coverage of copy_additional_locations
* Handle unexpected WebSocket messages during auth
When an add-on does not respond or closes the WebSocket connection
during the authentication phase Supervisor does not handle errors
gracefully. Simply log such unexpected authentication to avoid
unnecessary stack traces in the log and make such cases no longer
appear on Sentry.
* Add pytest
* Introduce a timeout of 10s
It seems that some systems continously run into D-Bus errors
overwhelming the system itself but also generating lots of errors on
Sentry. Rate limit D-Bus errors to 3 for every message every 30s.
WipeDevice method was dropped from OS Agent code in [1]. Remove it from
the mock class to sync with the current API. There is no usage of
WipeDevice in the Supervisor codebase, only ScheduleWipeDevice is
called.
[1] https://github.com/home-assistant/os-agent/pull/225
* Remove release assets tarball on frontend update
* Fix path to the removed tarball
---------
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Finish out effort of adding and enabling blockbuster
* Skip getting addon file size until securetar fixed
* Fix test for devcontainer and blocking I/O
* Fix docker fixture and load_config to post_init
* Use Sentry helper function to report warnings
Don't use Sentry directly but the existing helper function.
* Add pytest that Sentry is by default off
* Address ruff
* Address ruff
* Add blockbuster library and find I/O from unit tests
* Fix lint and test issue
* Fixes from feedback
* Avoid modifying webapp object in executor
* Split su options validation and only validate timezone on change
This essentially reverts PR #5685.
The Sentry `AsyncioIntegration` replaces the asyncio task factory with
its instrumentalized version, which reports all execeptions which
aren't handled *within* a task to Sentry.
However, we quite often run tasks and handle exceptions outside, e.g.
this commen pattern (example from `MountManager` `reload()``):
```python
results = await asyncio.gather(
*[mount.update() for mount in mounts], return_exceptions=True
)
... create resolution issues from results with exceptions ...
```
Here, asyncio.gather() uses ensure_future(), which converts the
co-routines to tasks. These Sentry instrumented tasks will then report
exceptions to Sentry, even though we handle exceptions gracefully.
So the `AsyncioIntegration` doesn't work for our use case, and causes
unnecessary noise in Sentry. Disable it again.
* Fix add-on store repository getting removed without internet
Currently, when a git command error happens in `pull()`, we declare
the repository as corrupt. Subsequent system autofix runs then execute
the reset resolution, which essentially removes the git repository from
the system.
In situations where the Internet fails right between the last
Supervisor connectivity check and the add-on store repository update
(the connectivity checks are throttled to once every 10 minutes while
connectivity is considered good), or if the outage is only partial
(e.g. reaching connectivity check works but the store repository is not
reachable), this leads to a git command error which declares the
repository as corrupt just as well, and ultimately leads to the removal
of the add-on store repository from the local system.
Run a git ls-remote first, which is used as an extra connectivity check.
This will also avoid removing the repository if Internet connectivity
works but the git provider is temporary down or not reachable.
That said, it will also fail if the repository is no longer present.
But this case needs extra handling anyways.
* Run git ls-remote in executor
* Make sure to close file stream after backup upload
Currently the file stream does not get closed before importing the file
stream. It seems the test case didn't catch that, presumably because
it is a race condition if the bytes get flushed to disk or not.
Properly close the stream before continue handling the file.
* Close file stream in executor
* Add comment about closing twice is fine
* Move read_text to executor
* switch to async_capture_exception
* Finish moving read_text to executor
* Cover read_bytes and some write_text calls as well
* Fix await issues
* Fix format_message
When connection is closed by the client, the journal_logs_reader
generator still returns new lines, trying to write each one of them to
the closed transport. With debug logging enabled, this can end up in an
endless loop. To fix that, break out of the loop immediately after the
connection is reset.
* Exclude non-Supervisor Server Error codes from Sentry reporting
Exclude status codes 502 Bad Gateway and 503 Service Unavailable from
Sentry aiohttp integration. These are returned by Supervisor itself
when acting as a proxy for Home Assistant Core. These aren't errors of
Supervisor.
* ruff check
* Replace non-unicode characters for add-on static files
Add-on documentation and changelog get read and returned as text file.
However, in case the original author used non-unicode characters, or
the file corrupted, loading currently fails with an UnicodeDecodeError.
Let's just use the built-in replace error handling of Python, so they
appear for the user as non-unicode characters by replacing them with
the official unicode replacement character "�".
* Remove superflous parameter for binary files
* ruff format
* Add pytests
When initially loading the store manager, update_repositories makes
sure that all repositories are actually present. If they are for some
reason corrupted or content is missing, we currently still trying
to load them which leads to an unnecessary warning:
```
2025-03-03 11:55:54.324 WARNING (SyncWorker_1) [supervisor.store.data] No repository information exists at /data/addons/git/a0d7b954
...
2025-03-03 11:55:54.343 INFO (MainThread) [supervisor.store.git] Cloning add-on https://github.com/hassio-addons/repository repository
```
Since update_repositories always loads the data, simply remove the
superfluous earlier loading attempt.
While at it, also improve the cloning/update log messages to make it
clear what repository is cloned/updated.
* Suppress all ClientConnectionReset when returning logs
In #5358 we started suppressing ClientConnectionReset when logs are
returned from the Journal Gateway and the client ends connection
unexpectedly. The connection can be closed also when the headers are
returned, so ignore also that error.
Refs #5606
* Log ClientConnectionResetError as DEBUG instead of suppressing it
* Fix cloning of add-on store repository
Since #5669, the add-on store reset no longer deletes the root
directory. However, if the root directory is not present, the current
code no longer invokes cloning, instead tries to load the git
repository directly.
With this change, the code clones whenever there is no .git directory,
which works for both cases.
* Fix pytest
* Move read_text to executor
* Fix issues found by coderabbit
* formated to formatted
* switch to async_capture_exception
* Find and replace got one too many
* Update patch mock to async_capture_exception
* Drop Sentry capture from format_message
The error handling got introduced in #2052, however, #2100 essentially
makes sure there will never be a byte object passed to this function.
And even if, the Sentry aiohttp plug-in will properly catch such an
exception.
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
Since #5696 we don't need to load the resolution center early. In fact,
with #5686 this is even problematic for pytests in devcontainer, since
the Supervisor Core state is valid and this causes AppArmor evaluations
to run (and fail).
Actually, #5696 removed the resolution center. #5686 brought it
accidentally back. This was seemingly a merge error.
By default, warnings are simply printed to stderr. This makes them
easy to miss in the log. Capture warnings and user Python logger to log
them with warning level.
Also, if the message is an instance of Exception (which it typically
is), report the warning to Sentry. This is e.g. useful for asyncio
RuntimeWarning warnings "coroutine was never awaited".
* Initialize Supervisor Core state in constructor
Make sure the Supervisor Core state is set to a value early on. This
makes sure that the state is always of type CoreState, and makes sure
that any use of the state can rely on it being an actual value from the
CoreState enum.
This fixes Sentry filter during early startup, where the state
previously was None. Because of that, the Sentry filter tried to
collect more Context, which lead to an exception and not reporting
errors.
* Fix pytest
It seems that with initializing the state early, the pytest actually
runs a system evaluation with:
Starting system evaluation with state initialize
Before it did that with:
Starting system evaluation with state None
It detects that the container runs as privileged, and declares the
system as unhealthy.
It is unclear to me why coresys.core.healthy was checked in this
context, it doesn't seem useful. Just remove the check, and validate
the state through the getter instead.
* Update supervisor/core.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Make sure Supervisor container is privileged in pytest
With the Supervisor Core state being valid now, some evaluations
now actually run when loading the resolution center. This leads to
Supervisor getting declared unhealthy due to not running in a privileged
container under pytest.
Fake the host container to be privileged to make evaluations not
causing the system to be declared unhealthy under pytest.
* Avoid writing actual Supervisor run state file
With the Supervisor Core state being valid from the very start, we end
up writing a state everytime.
Instead of actually writing a state file, simply validate the the
necessary calls are being made. This is more conform to typical unit
tests and avoids writing a file for every test.
* Extend WebSocket client fixture and use it consistently
Extend the ha_ws_client WebSocket client fixture to set Supervisor Core
into run state and clear all pending messages.
Currently only some tests use the ha_ws_client WebSocket client fixture.
Use it consistently for all tests.
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Load resolution evaluation, check and fixups early
Before #5652, these modules were loaded in the constructor, hence early
in `initialize_coresys()`. Moving them late actually exposed an issue
where NetworkManager connectivity setter couldn't get the
`connectivity_check` evaluation, leading to an exception early in
bootstrap.
Technically, it might be safe to load the resolution modules only in
`Core.connect()`, however then we'd have to load them separately for
pytest. Let's go conservative and load them the same place where they
got loaded before #5652.
* Load resolution modules in a single executor call
* Fix pytest
A newline is converted to a space as per YAML folding rules. The space
breaks markdown parsing of the link. Use a single line for the target
version link.
When developing/testing in a Supervised environment, the
systemd-journal-gatewayd socket is actually available. Mock the
socket Path file to make the test independent of the pytest
environment.
Overwrite the default body with useful version update information and
a link to the new release.
Also rename the title and use lower caps for local shell variables.
Enable the Sentry asyncio integration. This makes sure that exception
in non-awaited tasks get reported to sentry.
While at it, use partial instead of lambda for the filter function.
* Correctly handle aiohttp requests
The request header seems to be a dictionary in current Sentry SDK.
The previous code actually failed with an exception when trying to
unpack the header. However, it seems that Exceptions are not handled
or printed in this filter function, so those issues were simply
swallowed.
The new code has been tested to correctly sanitize and report issues
during aiohttp requests.
* Fix pytests
* Initialize machine information before Sentry
* Set user and machine for all reports
Now that we initialize machine earlier we can report user and machine
for all events, even before Supervisor is completely initialized.
Also use the new tag format which is a dictionary.
Note that it seems that with the current Sentry SDK version the
AioHttpIntegration no longer sets the URL as a tag. So sanitation is
no longer reuqired.
* Update pytests
The word "reboot" is usually used when a operating system is restarted.
The current log message could be interpreted that the Supervisor
detected an operating system reboot.
Use restart to make it clear that the Supervisor detected a restart of
itself.
Make sure that add-on store resets do not delete the root folder. This
is important so that successive reset attempts do not fail (the
directory passed to `remove_folder` must exist, otherwise find fails
with an non-zero exit code).
While at it, handle find errors properly and report errors as critical.
* Improve D-Bus timeout error handling
Typically D-Bus timeouts are related to systemd activation timing out
after 25s. The current dbus-fast timeout of 10s is well below that
so we never get the actual D-Bus error. This increases the dbus-fast
timeout to 30s, which will make sure we wait long enought to get the
actual D-Bus error from the broker.
Note that this should not slow down a typical system, since we tried
three times each waiting for 10s. With the new error handling typically
we'll end up waiting 25s and then receive the actual D-Bus error. There
is no point in waiting for multiple D-Bus/systemd caused timeouts.
* Create D-Bus TimedOut exception
Make sure that add-on store resets do not delete the root folder. This
is important so that successive reset attempts do not fail (the
directory passed to `remove_folder` must exist, otherwise find fails
with an non-zero exit code).
While at it, handle find errors properly and report errors as critical.
* Handle permission error on backup create
Make sure we handle (write) permission errors when creating a backup.
* Introduce BackupFileExistError and BackupPermissionError exceptions
* Make error messages a bit more uniform
* Drop use of exclusive mode
SecureTar does not handle exclusive mode nicely. Drop use of it for now.
With PR #5634 (which had the goal to remove I/O in event loop for backup
operations) the semantics of `remove_folder` changed slightly: Non-zero
exits of subprocesses were no longer handled, but lead to a
CalledProcessError.
Now to restore the semantics of `remove_folder` we should simply log an
error. However, this semantic change actually uncovered a potential
problem in deployed systems: There are 34 users on beta channel which
regularly seem to run `FixupStoreExecuteReset`, and with the semantic
change we see those errors in Sentry.
An obvious problem could be no storage. But in a quick test that would
not execute the repair in first place since the fixup has the job
condition `FREE_SPACE` set. So the problem is likely elsewhere.
With this change, we log the stderr of find, while still raising the
exception. With that we should get more context in Sentry to see what
could be the underlying error.
* Remove I/O in event loop for add-on backup and restore
Remove I/O in event loop for add-on backup and restore operations. On
backup, this moves the add-on shutdown before metadata is stored in the
backup, which slightly lenghens the time the add-on is actually stopped.
However, the biggest contributor here is likely adding the image
itself if it is a local backup. However, since that is the minority of
cases, I've opted for simplicity over optimizing for this case.
* Use partial to explicitly bind arguments
* Remove I/O in event loop for Home Assistant Core backup
The Home Assistant Core backup still contains some I/O in the event
loop. Move all I/O into the executor.
* Update supervisor/homeassistant/module.py
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Avoid IO in event loop when removing backup
* Refactor backup size calculation
Currently size is lazy loaded when required via properties. This
however is blocking the async event loop.
Backup sizes don't change. Instead of lazy loading the size of a backup
simply determine it on loading/after creation.
* Fix tests for backup size change
* Avoid IO in event loop when loading backups
* Avoid IO in event loop when importing a backup
We don't intent to run uv again, so the cache is not really useful.
The cache directory size is around 80MB, however, the files are mostly
hardlinks to the original files in `/usr/local/lib/python3.13/site-packages`
so the actual saving is much smaller.
* Remove I/O from backup create() function
* Move mount check into exectutor thread
* Remove I/O from backup open() function
* Remove I/O from _folder_save()
* Refactor remove_folder and remove_folder_with_excludes
Make remove_folder and remove_folder_with_excludes synchronous
functions which need to be run in an executor thread to be safely used
in asyncio. This makes them better composable with other I/O operations
like checking for file existence etc.
* Fix logger typo
* Use return values for functions running in an exectutor
* Move location check into a separate function
* Fix extract
Since transition from pip to uv in #5152, Supervisor container doesn't
contain bytecode for site-packages anymore, and because our AppArmor
profile denies mkdir operations, the compiled *.pyc files are never
created. Enable uv --compile option to opt for the same behavior as pip
had, to fix of the AA errors and the potential penalty of compilation on
every import.
* Validate Backup always before restoring
Since #5519 we check the encryption password early in restore case.
This has the side effect that we check the file existance early too.
However, in the non-encryption case, the file is not checked early.
This PR changes the behavior to always validate the backup file before
restoring, ensuring both encryption and non-encryption cases are
handled consistently.
In particular, the last case of test_restore_immediate_errors actually
validates that behavior. That test should actually have failed so far.
But it seems that because we validate the backup shortly after freeze
anyways, the exception still got raised early enough.
A simply `await asyncio.sleep(10)` right after the freeze makes the
test case fail. With this change, the test works consistently.
* Address pylint
* Fix backup_manager tests
* Drop warning message
When delivering multiple messages to Core, and the first fails with a
connection error, the second message will also fail with the same error.
But at this point the connection is already clsoed, which leads to an
exception in the exception handler. Avoid this compunding error by
checking if the connection is still exists before trying to close.
Fixes: #5629
* Delay inital version fetch until there is connectivity
* Add test
* Only mock get not whole websession object
* drive delayed fetch off of supervisor connectivity not host
* Fix test to not rely on sleep guessing to track tasks
* Use fixture to remove job throttle temporarily
* Drop Docker config from Supervisor backup
The Docker config is part of the main backup metadata. Because we
consolidate encrypted and unencrypted backups today, this leads to
potential bugs when restoring a backup.
* Drop obsolete encrypt/decrypt functions
* Drop unused Backup Job stage
* Fix restoring unencrypted backup in corner case
If a backup has a encrypted and unencrypted location, and the encrypted
location is beeing restored first, the encryption key is still cached.
When the user restores the unencrypted backup next, it will fail because
the Supervisor tries to use encryption key still.
* Add integration test for restoring backups with and without encryption
* Rename _validate_location_password to _set_location_password
* Reload backup metadata from restore location
* Revert "Reload backup metadata from restore location"
This reverts commit 9b47a1cfe9a2682a0908e08cd143373744084fb7.
* Make pytest work/punt the ball on docker config restore issue
* Address pylint error
* Handle non-existing file in Backup password check too
Make sure we handle a non-existing backup file also when validating
the password.
* Update supervisor/backups/manager.py
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Add test case and fix password check when multiple locations
* Mock default backup unprotected by default
Instead of setting the protected property which we might not use
everywhere, simply mock the default backup to be unprotected.
* Fix mock of protected backup
* Introduce test for validate_password
Testing showed that validate_password doesn't return anything. Extend
tests to cover this case and fix the actual code.
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Avoid reordering add-on repositories on Backup load
The `ensure_builtin_repositories` function uses a set to deduplicate
items, which sometimes led to a change of order in elements. This is
problematic when deduplicating Backups.
Simply avoid mangling the list of add-on repositories on load. Instead
rely on `update_repositories` which uses the same function to ensure
built-in repositories when loading the store configuration and restoring
a backup file.
* Update tests
* ruff format
* ruff check
* ruff check fixes
* ruff format
* Update tests/store/test_validate.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Simplify test
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Make the API return 404 for non-existing backup files
* Introduce BackupFileNotFoundError exception
* Return 404 on full restore as well
* Fix remaining API tests
* Improve error handling in delete
* Fix pytest
* Fix tests and change error handling to agreed logic
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
If the system is running for a long time, or the logging is particularly
chatty, the Systemd journal message we use to detect boot will be
rotated out of the journal. Currently we only handled it if there was
one boot, but we usually always missed the oldest boot if there were
more boots.
Adjust the method for getting boot IDs to always get the very first log
line in the journal instead of the last one, and make sure its boot ID
is included in the list.
* Avoid test failure by not checking exact size of backup
This is a workaround for the fact that the backup size is not exactly
the same every time. This is due to the fact that the inner gziped tar
file can vary in size due to difference in json file (key order) and
potentially also different field values (UUID, backup slug).
It seems that sorting the keys makes the actual difference today, but
this has runtime overhead and might not catch all cases.
Simply check if size property is there and a number bigger than 0
instead.
* Fix pytest
* Extend backup upload API with file name parameter
Add a query parameter which allows to specify the file name on upload.
All locations will store the backup with the same file name.
* ruff format
* Update tests to cover bad filename
* Fix ruff check error
* Drop unnecessary logging
* Use version which is treated CalVer by AwesomeVersion
The current dev version `99.9.9dev` is treated as unkown version type
by AwesomeVersion. This prevents the version from comparing with
actual Supervisor versions, e.g. from an exsiting backup file.
Make the development version a valid CalVer version so development
versions can handle non-development backups.
* Bump to year 9999
* Backup protected status can vary per location
* Fix test_backup_remove_error test
* Update supervisor/backups/backup.py
* Add Docker registry configuration to backup metadata
* Make use of backup location fixture
* Address pylint
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Extend backup API with file name field
Allow to specify a backup file name when creating a backup. This allows
for user friendly backup file names. If none is specified, the current
behavior remains (backup file name is the backup slug).
* Check passed file name using regex
* Use custom filename on download only if backup file name is backup slug
* ruff format
* Remove path from location for download file name
* Bump Supervisor to Python 3.13
* Update ruff configuration to 0.9.1
Adjust pyproject.toml for ruff 0.9.1. Also make sure that latest version
of ruff is used in pre-commit.
* Set default configuration for pytest-asyncio
* Run ruff check
* Drop deprecated decorator no_type_check_decorator
The upstream PR (https://github.com/python/cpython/issues/106309) says
this never got really implemented by type checkers.
* Bump devcontainer to latest release
Some devices are provided by kernel modules which potentially get loaded
later at startup. Those are not listed by udev, and hence add-ons do
not get permissions for these types of devices as long as the kernel
module is not loaded.
Typically, such devices are created by the kmod-static-nodes.service
systemd service. Ideally, we would read the output of that service and
add those specifically. However, there are very few devices which
use static nodes, and we actually only really interested in tun. So
let's simply add this static node in case udev does not list it already.
Introduce a validate password method which only peaks into the archive
to validate the password before starting the actual restore process.
This makes sure that a wrong password returns an error even when
restoring the backup in background.
When a backup tasks is run in background, but actually has an error
early the secondary event task to release the callee is lingering around
still, ultimately leading to a "Task was destroyed but it is pending!"
asyncio error.
Make sure we cancel the event task in case the backup returns early.
Currently the API converts backup locations on network mounts to
the Supervisor's Mount representation. However, the locations stored
in the backup representations is a dictionary with the location
string as key.
Make sure to use the backup location string to validate the remove
requests. This fixes removing backups from network storage mounts.
* Fix and extend cloud backup support
* Clean up task for cloud backup and remove by location
* Args to kwargs on backup methods
* Fix backup remove error test and typing clean up
When client requests (or more often follows) some busy logs and closes
the connection while the StreamWriter tries to write into it, an
exception is raised. This should be harmless, and unless there's another
way to handle this gracefully (I'm not aware of), then it should be safe
to ignore the exception in this context.
When an error occurs when streaming Supervisor logs, the fallback method
receives the follow kwarg as well, which is invalid for the Docker log
handler:
TypeError: APISupervisor.logs() got an unexpected keyword argument 'follow'
The exception is still printed to the logs but with all the extra noise
caused by this error. Removing the argument makes the stack trace more
comprehensible and the fallback actually works as desired.
* Throttle connectivity check on connectivity issue
If Supervisor detects a connectivity issue, currenlty every function
which requires internet get delayed by 10s due to the connectivity
check. This especially slows down initial startup when there are
connectivity issues. It is unlikely to resolve immeaditly, so throttle
the connectivity check to check every 30s.
* Fix pytest
* Reset throttle in test and refactor helper
* CodeRabbit suggestion
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Return cursor of the first host logs entry via headers
Return first entry's cursor via custom `X-First-Cursor` header that can
be consumed by the client and used for continual requesting of the
historic logs. Once the first fetch returns data, the cursor can be
supplied as the first argument to the Range header in another call,
fetching accurate slice of the journal with the previous log entries
using the `Range: entries=cursor[[:num_skip]:num_entries]` syntax.
Let's say we fetch logs with the Range header `entries=:-19:20` (to
fetch very last 20 lines of the logs, see below why not
`entries:-20:20`) and we get `cursor50` as the reply (the actual value
will be much more complex and with no guaranteed format). To fetch
previous slice of the logs, we use `entries=cursor50:-20:20`, which
would return 20 lines previous to `cursor50` and `cursor30` in the
cursor header. This way we can go all the way back to the history.
One problem with the cursor is that it's not possible to determine when
the negative num_skip points beyond the first log entry. In that case
the client either needs to know what the first entry is (via
`entries=:0:1`) or can iterate naively and stop once two subsequent
requests return the same first cursor.
Another caveat, even though it's unlikely it will be hit in real usage,
is that it's not possible to fetch the last line only - if no cursor is
provided, negative num_skip argument is needed, and in that case we're
pointing one record back from the current cursor, which is the previous
record. The least we can return without knowing any cursor is thus
`entries=👎2` (where the `2` can be omitted, however with
`entries=👎1` we would lose the last line). This also explains why
different `num_skip` and `num_entries` must be used for the first fetch.
* Fix typo (fallback->callback)
* Refactor journal_logs_reader to always return the cursor
* Update tests for new cursor handling
If no cursor is specified and negative num_skip is used, we're pointing
one record back from the last one, so host logs always returned 101
lines as the default. This was also the case of the lines query argument
that used the number directly as num_skip. Instead of doing that, point
N-1 records to the back and then get N records. Handle 1 record and
invalid numbers silently to avoid the need for error handling in
unpractical edge cases.
* Update DNS plug-in on network change
Restart the DNS plug-in when the primary network changes network
changes. This makes sure any potential host OS DNS configuration
changes get picked up by the DNS plug-in as well.
* Add a test case
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Bump pylint from 3.2.7 to 3.3.0
Bumps [pylint](https://github.com/pylint-dev/pylint) from 3.2.7 to 3.3.0.
- [Release notes](https://github.com/pylint-dev/pylint/releases)
- [Commits](https://github.com/pylint-dev/pylint/compare/v3.2.7...v3.3.0)
---
updated-dependencies:
- dependency-name: pylint
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
* Set positional arguments limit to 10
This makes the current codebase pass with pylint 3.3.0 while still
warning in case many positional arguments are used.
* Move to design section
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Improve WiFi settings error handling
Currently, the frontend potentially provides no WiFi settings dictionary
but still tries to update other (IP address) settings on the interface.
This leads to a stack trace since network manager is not able to fetch
the WiFi settings from the settings dictionary. Simply fill out what
we can and let NetworkManager provide an error.
Also allow to disable a network interface which has no configuration.
This avoids an error when switching to auto and back to disabled then
press save on a new wireless network interface.
* Add debug message when already disabled
* Add pytest for incomplete WiFi settings as psoted by frontend
Simulate the frontend posting no WiFi settings. Make sure the Supervisor
handles this gracefully.
* Allow to set user DNS through API with auto mode
Currently it is only possible to set DNS servers when in static mode.
However, there are use cases to set DNS servers when in auto mode as
well, e.g. if no local DNS server is provided by the DHCP, or the provided
DNS turns out to be non-working.
* Fix use separate data structure for IP configuration fallout
Make sure gateway is correctly converted to the internal IP
representation. Fix type info.
* Overwrite WiFi settings completely too
* Add test for DNS configuration
* Run ruff format
* ruff format
* Use schema validation as source for API defaults
Instead of using replace() simply set the API defaults in the API
schema.
* Revert "Use schema validation as source for API defaults"
This reverts commit 885506fd37395eb6cea9c787ee23349dac780b75.
* Use explicit dataclass initialization
This avoid the unnecessary replaces from before. It also makes it more
obvious that this part of the API doesn't patch existing settings.
Since headers are clumsy considering the Core proxy between the frontend
and Supervisor, add a way to adjust number of lines and verbose log
format using query parameters as well. If both query parameters and
headers are supplied, prefer the former, as it's more prominent when
reading through the request logs.
* Make IPv4 and IPv6 parse errors raise an API error
Currently, IP address parsing errors lead to an execption which is not
handled by the `api_validate()` call. By using concrete IPv4 and IPv6
types and `vol.Coerce()` parsing errors are properly handled.
* ruff format
* ruff check
* Improve connection settings fixture
Make the connection settings fixture behave more closely to the actual
NetworkManager. The behavior has been tested with NetworkManager 1.42.4
(Debian 12) and 1.44.2 (HAOS 13.1). This likely behaves similar in older
versions too.
* Introduce separate skeleton and settings for wireless
Instead of having a combined network settings object which has
Ethernet and Wirless settings, create a separate settings object for
wireless.
* Handle addresses/address-data property like NetworkManager
* Address ruff check
* Improve network API test
Add a test which changes from "static" to "auto". Validate that settings
are updated accordingly. Specifically, today this does clear the DNS
setting (by not providing the property).
* ruff format
* ruff check
* Complete TEST_INTERFACE rename
* Add partial network update as test case
* Test stub for keeping shared images after update
* Keep shared images on addon update
* ImageNotFound should only skip the one image not all
* Fix tests and nonetype error
* Normalize logic between two cleanup methods
When boot slot is selected manually in GRUB, the system boots into this
slot and marks it as good. However, the boot order is not changed, so in
the next boot (after an explicit or unexpected reboot) HAOS returns to
the version in the other slot. This might be confusing because if the
system has been running for some time, the user can forget they have
changed the boot slot to fix issue they had.
This gets more confusing if the "other" boot slot is selected manually
three times in a row. Let's say we have ORDER="A B". This means that
every time GRUB starts, it wants to boot slot A. If the slot B is
selected instead, only A_TRY is incremented, system boots into slot B
and marks slot B as good (B_OK=1, B_TRY=0). On another boot, this
repeats, yet A_TRY is incremented again. Until it reaches 3, the slot A
would be always chosen automatically, only after that it would boot to
slot B, presuming slot A is dead. The ORDER variable will be still
unchanged though.
This commit only makes sure that when the system is marked as healthy,
the slot is both marked as good AND active, updating the ORDER variable
as well. Because the X_TRY counter is incremented by GRUB, if we want
the other slot not to be marked as bad, we need to adjust the logic in
OS's grub.cfg as well, because Supervisor can't know whether it's
apppropriate to change other slot's state or not.
I also took the courtesy to adjust the logging a bit, to include the
stack trace in the error log if marking the slot fails somehow.
* Add manual_forced option to addon boot config
* Include client library in pull request template
* Add boot_config to api output so frontend can use it
* `manual_forced` to `manual_only`
Instead of using double quotes in f-strings use single quotes. This
works with Python 3.11 too. We don't use Python 3.11 anymore, but is
useful when running pytest on Debian bookworm natively, which comes
with Python 3.11.
This PR minimizes the D-Bus requirements for tests. It does this by
using dbus-daemon directly instead of dbus-launch. The latter is meant
for graphical applications and therefor has X11 dependencies. It also
leaves the D-Bus daemon running after the tests are done. This will
accumulate dbus-daemon processes over time which is not ideal.
I've also considered using dbus-run-session since it is meant to launch
processes with a private D-Bus session. For Python tests one could
launch it like so:
dbus-run-session -- python3 -m pytest ...
Then `DBUS_SESSION_BUS_ADDRESS` would be used automatically by the
`MessageBus` class. However, to keep the current behavior of the tests,
launching the D-Bus daemon manually is the better option.
* Use separate data structure for IP configuration
So far we use the same IpConfig data structure to represent the users
IP setting and the currently applied IP configuration.
This commit separates the two in IpConfig (for the currently applied
IP configuration) and IpSetting (representing the user provided IP
setting).
* Use custom string constants for connection settings
Use separate string constants for all connection settings. This makes
it easier to search where a particular NetworkManager connection
setting is used.
* Use Python typing for IpAddress in IpProperties
* Address pytest issue
* Improve and extend error handling on D-Bus connect
Avoid initializing the Supervisor board since it does not support the
Properties interface (see https://github.com/home-assistant/os-agent/issues/206).
This prevents the following somewhat confusing warning:
No OS-Agent support on the host. Some Host functions have been disabled.
The OS Agent is actually installed on the host, it is just a single
object which caused issues. No functionalty was actually lost, as the
Supervisor board object has no features currently, and all other
interfaces got properly initialized still (thanks to gather()).
Print warnings more fine graned so it is clear which object exactly
causes an issue. Also print a log message on the lowest layer when an
error occures on calling D-Bus. This allows to easier track the actual
D-Bus error source.
Fixes: #5241
* Fix tests
* Use local variable
* Avoid stack trace when board support fails to load
* Fix tests
* Use override mechanism to disable Properties support
Instead of disable loading of Supervised entirly override initialization
to prevent loading the Properties interface.
* Revert "Fix tests"
This reverts commit 1e3c491ace2ebef60945c45573acd8d4464ea83e.
* Stop backup if pre backup failed in Core
* Fix API tests
* Partial backup in ci since there's no Home assistant
* Add ssl folder to partial backup
* Allow backups when Home Assistant is not running
* Undo change to skip db test
* Home Assistant watchdog attempts safe mode after max fails
* Remove duplicate line
* Refactor and logging change from feedback
* Update supervisor/misc/tasks.py
* Fix log text check in test
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Add Exception handling to UDEV reading and parsing
Khadas VIM4 UDEV returns something that python crapps its pants about. The exception just allows it to continue.
* Add an exception print
Added an exception print for the times things go bad.
* Split exception handling
The exception is not fatal when parsing error happens on one node. print it and continue.
* cleanups
* swapped functions
device.device_node function bails very badly! It raises no exceptions to the top but complains and errors
* Update supervisor/hardware/manager.py
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Update supervisor/hardware/manager.py
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Pascal Vizeli <pvizeli@syshack.ch>
* Max retries for auto applying addon image fixup
* Update supervisor/resolution/fixups/addon_execute_repair.py
Co-authored-by: Stefan Agner <stefan@agner.ch>
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
This reverts commit 71e91328f1c225d774ba9492e133147470427744.
It is not proven that the segfaults seen in Core are resolved. Since
we are shipping a bugfix release, conservatively revert back to 3.9.15.
2024-05-07 16:02:34 +02:00
4329 changed files with 23819 additions and 507825 deletions
name: Report a bug with the Supervisor on a supported System
about: Report an issue related to the Home Assistant Supervisor.
labels: bug
---
<!-- READ THIS FIRST:
- If you need additional help with this template please refer to https://www.home-assistant.io/help/reporting_issues/
- This is for bugs only. Feature and enhancement requests should go in our community forum: https://community.home-assistant.io/c/feature-requests
- Provide as many details as possible. Paste logs, configuration sample and code into the backticks. Do not delete any text from this template!
- If you have a problem with an add-on, make an issue in it's repository.
-->
<!--
Important: You can only fill a bug repport for an supported system! If you run an unsupported installation. This report would be closed without comment.
-->
### Describe the issue
<!-- Provide as many details as possible. -->
### Steps to reproduce
<!-- What do you do to encounter the issue. -->
1. ...
2. ...
3. ...
### Enviroment details
<!-- You can find these details in the system tab of the supervisor panel, or by using the `ha` CLI. -->
- **Operating System:**: xxx
- **Supervisor version:**: xxx
- **Home Assistant version**: xxx
### Supervisor logs
<details>
<summary>Supervisor logs</summary>
<!--
- Frontend -> Supervisor -> System
- Or use this command: ha supervisor logs
- Logs are more than just errors, even if you don't think it's important, it is.
If you don't know, can be found in [Settings -> System -> Repairs -> System Information](https://my.home-assistant.io/redirect/system_health/).
If you don't know, can be found in [Settings -> System -> Repairs -> (three dot menu) -> System Information](https://my.home-assistant.io/redirect/system_health/).
It is listed as the `Installation Type` value.
options:
- Home Assistant OS
@ -72,9 +71,9 @@ body:
validations:
required:true
attributes:
label:System Health information
label:System information
description:>
System Health information can be found in the top right menu in [Settings -> System -> Repairs](https://my.home-assistant.io/redirect/repairs/).
The System information can be found in [Settings -> System -> Repairs -> (three dot menu) -> System Information](https://my.home-assistant.io/redirect/system_health/).
Click the copy button at the bottom of the pop-up and paste it here.
[](https://my.home-assistant.io/redirect/system_health/)
@ -83,8 +82,9 @@ body:
label:Supervisor diagnostics
placeholder:"drag-and-drop the diagnostics data file here (do not copy-and-paste the content)"
description:>-
Supervisor diagnostics can be found in [Settings -> Integrations](https://my.home-assistant.io/redirect/integrations/).
Find the card that says `Home Assistant Supervisor`, open its menu and select 'Download diagnostics'.
Supervisor diagnostics can be found in [Settings -> Devices & services](https://my.home-assistant.io/redirect/integrations/).
Find the card that says `Home Assistant Supervisor`, open it, and select the three dot menu of the Supervisor integration entry
and select 'Download diagnostics'.
**Pleasedrag-and-drop the downloaded file into the textbox below. Do not copy and paste its contents.**
[](https://www.openhomefoundation.org/)
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.