When fetching the host info without UDisks2 D-Bus connected it would raise an exception and disable managing addons via home assistant (and other GUI functions that need host info status).
* Treat containerd snapshotter/overlayfs driver as supported
With home-assistant/operating-system#4252 the storage driver would
change to "overlayfs". We don't want the system to be marked as
unsupported. It should be safe to treat it as supported even now, so add
it to the list of allowed values.
* Flip the logic
(note for self: don't forget to check for unstaged changes before push)
* Set valid storage for invalid logging test case
* Add support for ulimit in addon config
Similar to docker-compose, this adds support for setting ulimits
for addons via the addon config. This is useful e.g. for InfluxDB
which on its own does not support setting higher open file descriptor
limits, but recommends increasing limits on the host.
* Make soft and hard limit mandatory if ulimit is a dict
* Add progress reporting to addon, HA and Supervisor updates
* Fix assert in test
* Add progress to addon, core, supervisor updates/installs
* Fix double install bug in addons install
* Remove initial_install and re-arrange order of load
* Fix CID file handling to prevent directory creation
It seems that under certain conditions Docker creates a directory
instead of a file for the CID file. This change ensures that
the CID file is always created as a file, and any existing directory
is removed before creating the file.
* Fix tests
* Fix pytest
* Fix range header to correctly fetch latest logs
Add a colon before line numbers to indicate that no cursor is used.
This makes the range header work when fetching latest logs from
systemd-journal-gatewayd.
* Fix pytest
* Check Core version and raise unsupported if older than 2 years
Check the currently installed Core version relative to the current
date, and if its older than 2 years, mark the system unsupported.
Also add a Job condition to prevent automatic refreshing of the update
information in this case.
* Handle landing page correctly
* Handle non-parseable versions gracefully
Also align handling between OS and Core version evaluations.
* Extend and fix test coverage
* Improve Job condition error
* Fix pytest
* Block execution of fetch_data and store reload jobs
Block execution of fetch_data and store reload jobs if the core version
is unsupported. This essentially freezes the installation until the
user takes action and updates the Core version to a supported one.
* Use latest known Core version as reference
Instead of using current date to determine if Core version is more than
2 years old, use the latest known Core version as reference point and
check if current version is more than 24 releases behind.
This is crucial because when update information refresh is disabled due to
unsupported Core version, using date would create a permanent unsupported
state. Even if users update to the last known version in 4+ years, the
system would remain unsupported. By using latest known version as reference,
updating Core to the last known version makes the system supported again,
allowing update information refresh to resume.
This ensures users can always escape the unsupported state by updating
to the last known Core version, maintaining the update refresh cycle.
* Improve version comparision logic
* Use Home Assistant Core instead of just Core
Avoid any ambiguity in what is exactly outdated/unsupported by using
Home Assistant Core instead of just Core.
* Sort const alphabetically
* Update tests/resolution/evaluation/test_evaluate_home_assistant_core_version.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Allow arbitrarily nested addon config schemas
* Disallow lists directly nested in another list in addon schema
* Handle arbitrarily nested addon schemas in UiOptions class
* Handle arbitrarily nested addon schemas in AddonOptions class
* Add tests for addon config schemas
* Add tests for addon option validation
* Add endpoint for complete logs of the latest container startup
Add endpoint that returns complete logs of the latest startup of
container, which can be used for downloading Core logs in the frontend.
Realtime filtering header is used for the Journal API and StartedAt
parameter from the Docker API is used as the reference point. This means
that any other Range header is ignored for this parameter, yet the
"lines" query argument can be used to limit the number of lines. By
default "infinite" number of lines is returned.
Closes#6147
* Implement fallback for latest logs for OS older than 16.0
Implement fallback which uses the internal CONTAINER_LOG_EPOCH metadata
added to logs created by the Docker logger. Still prefer the time-based
method, as it has lower overhead and using public APIs.
* Address review comments
* Only use CONTAINER_LOG_EPOCH for latest logs
As pointed out in the review comments, we might not be able to get the
StartedAt for add-ons that are not running. Thus we need to use the only
reliable mechanism available now, which is the container log epoch.
* Remove dead code for 'Range: realtime' header handling
* Write cidfiles of Docker containers and mount them individually to /run/cid
There is no standard way to get the container ID in the container
itself, which can be needed for instance for #6006. The usual pattern is
to use the --cidfile argument of Docker CLI and mount the generated file
to the container. However, this is feature of Docker CLI and we can't
use it when creating the containers via API. To get container ID to
implement native logging in e.g. Core as well, we need the help of the
Supervisor.
This change implements similar feature fully in Supervisor's DockerAPI
class that orchestrates lifetime of all containers managed by
Supervisor. The files are created in the SUPERVISOR_DATA directory, as
it needs to be persisted between reboots, just as the instances of
Docker containers are.
Supervisor's cidfile must be created when starting the Supervisor
itself, for that see home-assistant/operating-system#4276.
* Address review comments, fix mounting of the cidfile
* Store and persist OS upgrade map to fix update path evaluation
The existing logic calculated OS upgrade paths inline during fetch_data,
which will not get reevaluted when the current OS is unsupported
(JobCondition.OS_SUPPORTED). E.g. after updating from 11.4 to 11.5, the
system wouldn't offer the next available update (15.2) because the
upgrade path calculation relied on fresh data from the blocked fetch
operation.
Changes:
- Add ATTR_HASSOS_UPGRADE constant and schema validation
- Store hassos-upgrade map from version JSON in updater data
- Refactor version_hassos property to use stored upgrade map instead of
inline calculation during fetch_data
- Maintain upgrade path logic: upgrade within major version first, then
jump to next major version when at the latest in current major
- Add type safety checks for version.major access
This ensures upgrade paths work correctly even when update data refresh
is blocked due to unsupported OS versions, fixing the scenario where
HAOS 11.5 wouldn't show 15.2 as the next available update.
* Update supervisor/updater.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Address mypy issue
* Fix pytest
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Add availability API for addons
* Add cast back and test for latest version of installed addon
* Make error responses more translation/client library friendly
* Add test cases for install/update APIs
Running tests in UTC+2 timezone makes some of the tests fail because the
mocked time in the future is actually in the past, as UTC is used as the
new reference point. Adjust the tests to mock also the time when the
first execution of function happens.
Instances where the second execution happened "immediately" were mocked
to happen 1ms later. The 1ms delta is also needed to be added when
mocking time 1h in the future, otherwise it will be throttled too.
* Add background option to update/install APIs
* Refactor to use common background_task utility in backups too
* Use a validation_complete event rather then looking for bus events
* Handle missing type attribute in add-on map config
Handle missing type attribute in the add-on `map` configuration key.
* Make sure wrong volumes are cleared in any case
Also add warning when string mapping is rejected.
* Add unit tests
* Improve test coverage
* Stop refreshing the update information on outdated OS versions
Add `JobCondition.OS_SUPPORTED` to the updater job to avoid
refreshing update information when the OS version is unsupported.
This effectively freezes installations on unsupported OS versions
and blocks Supervisor updates. Once deployed, this ensures that any
Supervisor will always run on at least the minimum supported OS
version.
This requires to move the OS version check before Supervisor updater
initialization to allow the `JobCondition.OS_SUPPORTED` to work
correctly.
* Run only OS version check in setup loads
Instead of running a full system evaluation, only run the OS version
check right after the OS manager is loaded. This allows the
updater job condition to work correctly without running the full
system evaluation, which is not needed at this point.
* Prevent Core and Add-on updates on unsupported OS versions
Also prevent Home Assistant Core and Add-on updates on unsupported OS
versions. We could imply `JobCondition.SUPERVISOR_UPDATED` whenever
OS is outdated, but this would also prevent the OS update itself. So
we need this separate condition everywhere where
`JobCondition.SUPERVISOR_UPDATED` is used except for OS updates.
It should also be safe to let the add-on store update, we simply
don't allow the add-on to be installed or updated if the OS is
outdated.
* Remove unnecessary Host info update
It seems that the CPE information are already loaded in the HostInfo
object. Remove the unnecessary update call.
* Fix pytest
* Delay refreshing of update data
Delay refreshing of update data until after setup phase. This allows to
use the JobCondition.OS_SUPPORTED safely. We still have to fetch the
updater data in case OS information is outdated. This typically happens
on device wipe.
Note also that plug-ins will automatically refresh updater data in case
it is missing the latest version information.
This will reverse the order of updates when there are new plug-in and
Supervisor update information available (e.g. on first startup):
Previously the updater data got refreshed before the plug-in started,
which caused them to update first. Then the Supervisor got update in
startup phase. Now the updater data gets refreshed in startup phase,
which then causes the Supervisor to update first before the plug-ins
get updated after Supervisor restart.
* Fix pytest
* Fix updater tests
* Add new tests to verify that updater reload is skipped
* Fix pylint
* Apply suggestions from code review
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Add debug message when we delay version fetch
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Fix NetworkManager connection name for VLANs
The connection name for VLANs should include the parent interface name
for better identification. This was originally the intention, but the
interface object's name property was used which appears empty at that
point.
* Disallow creating multiple connections for the same VLAN id
Only allow a single connection per interface and VLAN id. The regular
network commands can be used to alter the configuration.
* Fix pytest
* Simply connection id name generation
Always rely on the Supervisor interface representation's name attribute
to generate the NetworkManager connection id. Make sure that the name
is correctly set when creating VLAN interfaces as well.
* Special case VLAN configuration
We can't use the match information when comparing Supervisor interface
representation with D-Bus representations. Special case VLAN and
compare using VLAN ID and parent interface.
Note that this currently compares connection UUID of the parent
interface.
* Fix pytest
* Separate VLAN creation logic from apply_changes
Apply changes is really all about updating the NetworkManager settings
of a particular network interface. The base in apply_changes() is
NetworkInterface class, which is the NetworkManager Device abstraction.
All physical interfaces have such a Device hence it is always present.
The only exception is when creating a VLAN: Since it is a virtual
device, there is no device when creating a VLAN.
This separate the two cases. This makes it much easier to reason if
a VLAN already exists or not, and to handle the case where a VLAN
needs to be created.
For all other network interfaces, the apply_changes() method can
now rely on the presence of the NetworkInterface Device abstraction.
* Add VLAN test interface and VLAN exists test
Add a test which checks that an error gets raised when a VLAN for a
particular interface/id combination already exists.
* Address pylint
* Fix test_ignore_veth_only_changes pytest
* Make VLAN interface disabled to avoid test issues
* Reference setting 38 in mocked connection
* Make sure interface type matches
Require a interface type match before doing any comparision.
* Add Supervisor host network configuration tests
* Fix device type checking
* Fix pytest
* Fix tests by taking VLAN interface into account
* Fix test_load_with_network_connection_issues
This seems like a hack, but it turns out that the additional active
connection caused coresys.host.network.update() to be called, which
implicitly "fake" activated the connection. Now it seems that our
mocking causes IPv4 gateway to be set.
So in a way, the test checked a particular mock behavior instead of
actual intention.
The crucial part of this test is that we make sure the settings remain
unchanged. This is done by ensuring that the the method is still auto.
* Fix test_check_network_interface_ipv4.py
Now that we have the VLAN interface active too it will raise an issue
as well.
* Apply suggestions from code review
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Fix ruff check issue
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Send progress updates during image pull for install/update
* Add extra to tests about job APIs
* Sent out of date progress to sentry and combine done event
* Pulling container image layer
* Fix docker_config check to ignore Docker VOLUME mounts
Only validate /media and /share mounts that are explicitly configured
in add-on map_volumes, not those created by Docker VOLUME statements.
* Check and test with custom map targets
* Optimize directory_missing_or_empty function
Replace inefficient os.listdir() with os.scandir() and next() to check
if directory is empty. This avoids reading entire directory contents
into memory when we only need to know if any entry exists.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Add unit tests for directory_missing_or_empty function
Add comprehensive test coverage for the optimized directory_missing_or_empty
function, testing empty directories, directories with content, non-existent
paths, and files (non-directories).
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Apply suggestions from code review
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
---------
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Propagate timezone setting to host in OS 16.2 and newer
With home-assistant/operating-system#4224, timezone setting in OS can be
peristently set in HAOS as well. Propagate the timezone configured in
Supervisor config (which can be changed through general system settings
in HA Core) through the DBus API for setting the timezone.
* Persist timezone also when it's been obtained from Whoami
* Suppress pylint fixme error
* Storage space usage API
* Move to host API
* add tests
* fix test url
* more tests
* fix tests
* fix test
* PR comments
* update test
* tweak format and url
* add .DS_Store to .gitignore
* update tests
* test coverage
* update to new struct
* update test
Not all disks have all SMART attributes available, e.g. Sentry showed
devices with missing "wctemp". In practice, any SMART attribute could
be missing. Make sure we handle this gracefully.
* Use context manager for Job concurrency control
* Allow to release lock outside of Job running context
* Improve JobGroup locking with external ownership tracking
Track lock ownership by job UUID instead of execution context. This
allows external lock release via job parameter.
* Fix acquire lock in nested Jobs
* Simplify nested lock tracking
* Simplify Job group lock acquisition logic
* Simplify by using helper methods
* Allow throttling with group concurrency
* Use Lock instead of Semaphore for job concurrency control
Use the same synchronization primitive (Lock) for job concurrency
control as used in job groups.
* Go back to lock ownership tracking with references
* Drop unused property `active_job_id`
* Drop unused property `can_acquire`
* Replace assert with cast
* Add unsupported reason os_version and evaluation
* Order enum and add tests
* Apply suggestions from code review
* Apply suggestions from code review
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
This avoids that we display a 10% life time use for a brand new
eMMC storage. The values are estimates anyways, and there is a
separate value which represents life time completely used (100%).
* Block OS updates when the system is unhealthy
In #6024 we mark a system as unhealthy when multiple OS installations
were found. The idea was to block OS updates in this case. However, it
turns out that the OS update job was not checking the system health
and thus allowed updates even when the system was marked as unhealthy.
This commit adds the `JobCondition.HEALTHY` condition to the OS update
job, ensuring that OS updates are only performed when the system is
healthy.
Users can force an OS update still by using
`ha jobs options --ignore-conditions healthy`.
* Add test for update of unhealthy system
---------
Co-authored-by: Jan Čermák <sairon@sairon.cz>
* Split execution limit in concurrency and throttle parameters
Currently the execution limit combines two ortogonal features: Limit
concurrency and throttle execution. This change separates the two
features, allowing for more flexible configuration of job execution.
Ultimately I want to get rid of the old limit parameter. But for ease
of review and migration, I'd like to do this in two steps: First
introduce the new parameters, and map the old limit parameters to the
new parameters. Then, in a second step, remove the old limit parameter
and migrate all users to the new concurrency and throttle parameters
as needed.
* Introduce common lock release method
* Fix THROTTLE_WAIT behavior
The concurrency QUEUE does not really QUEUE throttle limits.
* Add documentation for new concurrency/throttle Job options
* Handle group options for concurrency and throttle separately
* Fix GROUP_THROTTLE_WAIT concurrency setting
We need to use the QUEUE concurrency setting instead of GROUP_QUEUE
for the GROUP_THROTTLE_WAIT execution limit. Otherwise the
test_jobs_decorator.py::test_execution_limit_group_throttle_wait
test deadlocks.
The reason this deadlocks is because GROUP_QUEUE concurrency doesn't
really work because we only can release a group lock if the job is
actually running.
Or put differently, throttling isn't supported with GROUP_*
concurrency options.
* Prevent using any throttling with group concurrency
The group concurrency modes (reject and queue) are not compatible with
any throttling, since we currently can't unlock the group lock when
a job doesn't get started (which is the case when throttling is
applied).
* Fix commit in group rate limit
* Explain the deadlock issue with group locks in code
* Handle locking correctly on throttle limit exceptions
* Introduce pytest for new job decorator combinations
* Enable IPv6 by default for new installations
Enable IPv6 by default for new Supervisor installations. Let's also
make the `enable_ipv6` attribute nullable, so we can distinguish
between "not set" and "set to false".
* Add pytest
* Add log message that system restart is required for IPv6 changes
* Fix API pytest
* Create resolution center issue when reboot is required
* Order log after actual setter call
* Add resolution check for duplicate OS installations
* Only create single issue/use separate unhealthy type
* Check MBR partition UUIDs as well
* Use partlabel
* Use generator to avoid code duplication
* Add list of devices, avoid unnecessary exception handling
* Run check only on HAOS
* Fix message formatting
* Fix and simplify pytests
* Fix UnhealthyReason sort order