Commit Graph

702 Commits

Author SHA1 Message Date
Michel Nederlof
b92f5976a3 If D-Bus to UDisks2 is not connected, return None (#6265)
When fetching the host info without UDisks2 D-Bus connected it would raise an exception and disable managing addons via home assistant (and other GUI functions that need host info status).
2025-10-24 10:22:02 +02:00
Jan Čermák
ddb8588d77 Treat containerd snapshotter/overlayfs driver as supported (#6242)
* Treat containerd snapshotter/overlayfs driver as supported

With home-assistant/operating-system#4252 the storage driver would
change to "overlayfs". We don't want the system to be marked as
unsupported. It should be safe to treat it as supported even now, so add
it to the list of allowed values.

* Flip the logic

(note for self: don't forget to check for unstaged changes before push)

* Set valid storage for invalid logging test case
2025-10-09 18:47:06 +02:00
Stefan Agner
53a8044aff Add support for ulimit in addon config (#6206)
* Add support for ulimit in addon config

Similar to docker-compose, this adds support for setting ulimits
for addons via the addon config. This is useful e.g. for InfluxDB
which on its own does not support setting higher open file descriptor
limits, but recommends increasing limits on the host.

* Make soft and hard limit mandatory if ulimit is a dict
2025-10-08 12:43:12 +02:00
Mike Degatano
190b734332 Add progress reporting to addon, HA and Supervisor updates (#6195)
* Add progress reporting to addon, HA and Supervisor updates

* Fix assert in test

* Add progress to addon, core, supervisor updates/installs

* Fix double install bug in addons install

* Remove initial_install and re-arrange order of load
2025-10-07 16:54:11 +02:00
Stefan Agner
78a2e15ebb Replace non-UTF-8 characters in log messages (#6227) 2025-10-02 10:35:50 +02:00
Stefan Agner
f3e1e0f423 Fix CID file handling to prevent directory creation (#6225)
* Fix CID file handling to prevent directory creation

It seems that under certain conditions Docker creates a directory
instead of a file for the CID file. This change ensures that
the CID file is always created as a file, and any existing directory
is removed before creating the file.

* Fix tests

* Fix pytest
2025-10-02 09:24:19 +02:00
Mike Degatano
64f94a159c Add progress syncing from child jobs (#6207)
* Add progress syncing from child jobs

* Fix pylint issue

* Set initial progress from parent and end at 100
2025-09-30 14:52:16 -04:00
Stefan Agner
fabfe760fb Fix flaky test_get_dir_structure_sizes_ebadmsg_error (#6211) 2025-09-25 12:29:18 +02:00
Stefan Agner
97c7686b95 Simplify API validation by removing confusing origin parameter (#6203)
Co-authored-by: Claude <noreply@anthropic.com>
2025-09-23 22:10:48 +02:00
Mike Degatano
42f93d0176 Remove message_template field from errors (#6205) 2025-09-23 17:07:38 +02:00
Stefan Agner
ed7155604c Fix range header to correctly fetch latest logs (#6202)
* Fix range header to correctly fetch latest logs

Add a colon before line numbers to indicate that no cursor is used.
This makes the range header work when fetching latest logs from
systemd-journal-gatewayd.

* Fix pytest
2025-09-23 16:43:20 +02:00
Mike Degatano
9f5bebd0eb Mount falls back on restart if reload fails (#6197) 2025-09-19 17:59:50 +02:00
Stefan Agner
c712d3cc53 Check Core version and raise unsupported if older than 2 years (#6148)
* Check Core version and raise unsupported if older than 2 years

Check the currently installed Core version relative to the current
date, and if its older than 2 years, mark the system unsupported.
Also add a Job condition to prevent automatic refreshing of the update
information in this case.

* Handle landing page correctly

* Handle non-parseable versions gracefully

Also align handling between OS and Core version evaluations.

* Extend and fix test coverage

* Improve Job condition error

* Fix pytest

* Block execution of fetch_data and store reload jobs

Block execution of fetch_data and store reload jobs if the core version
is unsupported. This essentially freezes the installation until the
user takes action and updates the Core version to a supported one.

* Use latest known Core version as reference

Instead of using current date to determine if Core version is more than
2 years old, use the latest known Core version as reference point and
check if current version is more than 24 releases behind.

This is crucial because when update information refresh is disabled due to
unsupported Core version, using date would create a permanent unsupported
state. Even if users update to the last known version in 4+ years, the
system would remain unsupported. By using latest known version as reference,
updating Core to the last known version makes the system supported again,
allowing update information refresh to resume.

This ensures users can always escape the unsupported state by updating
to the last known Core version, maintaining the update refresh cycle.

* Improve version comparision logic

* Use Home Assistant Core instead of just Core

Avoid any ambiguity in what is exactly outdated/unsupported by using
Home Assistant Core instead of just Core.

* Sort const alphabetically

* Update tests/resolution/evaluation/test_evaluate_home_assistant_core_version.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-19 17:58:37 +02:00
Mike Degatano
01911a44cd Persistent notifications to repairs and fix free_space check (#6179)
* Persistent notifications to repairs and fix free_space check

* Fix tests mocking too little free space
2025-09-16 11:22:59 -04:00
Lukas Waslowski
857dae7736 Allow adding translations for nested fields to support home-assistant/frontend#26997 (#6180) 2025-09-16 11:36:45 +02:00
Lukas Waslowski
ac9947d599 Allow deeply nested dicts and lists in addon config schemas (#6171)
* Allow arbitrarily nested addon config schemas

* Disallow lists directly nested in another list in addon schema

* Handle arbitrarily nested addon schemas in UiOptions class

* Handle arbitrarily nested addon schemas in AddonOptions class

* Add tests for addon config schemas

* Add tests for addon option validation
2025-09-16 11:32:28 +02:00
Jan Čermák
2e22e1e884 Add endpoint for complete logs of the latest container startup (#6163)
* Add endpoint for complete logs of the latest container startup

Add endpoint that returns complete logs of the latest startup of
container, which can be used for downloading Core logs in the frontend.

Realtime filtering header is used for the Journal API and StartedAt
parameter from the Docker API is used as the reference point. This means
that any other Range header is ignored for this parameter, yet the
"lines" query argument can be used to limit the number of lines. By
default "infinite" number of lines is returned.

Closes #6147

* Implement fallback for latest logs for OS older than 16.0

Implement fallback which uses the internal CONTAINER_LOG_EPOCH metadata
added to logs created by the Docker logger. Still prefer the time-based
method, as it has lower overhead and using public APIs.

* Address review comments

* Only use CONTAINER_LOG_EPOCH for latest logs

As pointed out in the review comments, we might not be able to get the
StartedAt for add-ons that are not running. Thus we need to use the only
reliable mechanism available now, which is the container log epoch.

* Remove dead code for 'Range: realtime' header handling
2025-09-16 11:29:28 +02:00
Jan Čermák
bbb9469c1c Write cidfiles of Docker containers and mount them individually to /run/cid (#6154)
* Write cidfiles of Docker containers and mount them individually to /run/cid

There is no standard way to get the container ID in the container
itself, which can be needed for instance for #6006. The usual pattern is
to use the --cidfile argument of Docker CLI and mount the generated file
to the container. However, this is feature of Docker CLI and we can't
use it when creating the containers via API. To get container ID to
implement native logging in e.g. Core as well, we need the help of the
Supervisor.

This change implements similar feature fully in Supervisor's DockerAPI
class that orchestrates lifetime of all containers managed by
Supervisor. The files are created in the SUPERVISOR_DATA directory, as
it needs to be persisted between reboots, just as the instances of
Docker containers are.

Supervisor's cidfile must be created when starting the Supervisor
itself, for that see home-assistant/operating-system#4276.

* Address review comments, fix mounting of the cidfile
2025-09-09 13:38:31 +02:00
Stefan Agner
c277f3cad6 Store and persist OS upgrade map to fix update path evaluation (#6152)
* Store and persist OS upgrade map to fix update path evaluation

The existing logic calculated OS upgrade paths inline during fetch_data,
which will not get reevaluted when the current OS is unsupported
(JobCondition.OS_SUPPORTED). E.g. after updating from 11.4 to 11.5, the
system wouldn't offer the next available update (15.2) because the
upgrade path calculation relied on fresh data from the blocked fetch
operation.

Changes:
- Add ATTR_HASSOS_UPGRADE constant and schema validation
- Store hassos-upgrade map from version JSON in updater data
- Refactor version_hassos property to use stored upgrade map instead of
  inline calculation during fetch_data
- Maintain upgrade path logic: upgrade within major version first, then
  jump to next major version when at the latest in current major
- Add type safety checks for version.major access

This ensures upgrade paths work correctly even when update data refresh
is blocked due to unsupported OS versions, fixing the scenario where
HAOS 11.5 wouldn't show 15.2 as the next available update.

* Update supervisor/updater.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Address mypy issue

* Fix pytest

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-09-04 13:19:31 +02:00
Igor Yamolov
236c39cbb0 Add network interface settings for mDNS/LLMNR (#5520) 2025-09-04 13:18:11 +02:00
Mike Degatano
7ed83a15fe Add availability API for addons (#6140)
* Add availability API for addons

* Add cast back and test for latest version of installed addon

* Make error responses more translation/client library friendly

* Add test cases for install/update APIs
2025-09-04 11:14:42 +02:00
Jan Čermák
3d62c9afb1 Make test_job_decorator tests timezone agnostic (#6153)
Running tests in UTC+2 timezone makes some of the tests fail because the
mocked time in the future is actually in the past, as UTC is used as the
new reference point. Adjust the tests to mock also the time when the
first execution of function happens.

Instances where the second execution happened "immediately" were mocked
to happen 1ms later. The 1ms delta is also needed to be added when
mocking time 1h in the future, otherwise it will be throttled too.
2025-09-03 17:55:28 +02:00
Mike Degatano
9392d10625 Add background option to update/install APIs (#6134)
* Add background option to update/install APIs

* Refactor to use common background_task utility in backups too

* Use a validation_complete event rather then looking for bus events
2025-09-03 08:33:00 +02:00
Mike Degatano
78be155b94 Handle download retart in pull progres log (#6131) 2025-08-25 23:20:00 +02:00
Mike Degatano
9900dfc8ca Do not skip messages in pull progress log due to rounding (#6129) 2025-08-25 22:25:38 +02:00
Stefan Agner
3a1ebc9d37 Handle malformed addon map entries gracefully (#6126)
* Handle missing type attribute in add-on map config

Handle missing type attribute in the add-on `map` configuration key.

* Make sure wrong volumes are cleared in any case

Also add warning when string mapping is rejected.

* Add unit tests

* Improve test coverage
2025-08-25 22:24:46 +02:00
Stefan Agner
2d12920b35 Stop refreshing the update information on outdated OS versions (#6098)
* Stop refreshing the update information on outdated OS versions

Add `JobCondition.OS_SUPPORTED` to the updater job to avoid
refreshing update information when the OS version is unsupported.

This effectively freezes installations on unsupported OS versions
and blocks Supervisor updates. Once deployed, this ensures that any
Supervisor will always run on at least the minimum supported OS
version.

This requires to move the OS version check before Supervisor updater
initialization to allow the `JobCondition.OS_SUPPORTED` to work
correctly.

* Run only OS version check in setup loads

Instead of running a full system evaluation, only run the OS version
check right after the OS manager is loaded. This allows the
updater job condition to work correctly without running the full
system evaluation, which is not needed at this point.

* Prevent Core and Add-on updates on unsupported OS versions

Also prevent Home Assistant Core and Add-on updates on unsupported OS
versions. We could imply `JobCondition.SUPERVISOR_UPDATED` whenever
OS is outdated, but this would also prevent the OS update itself. So
we need this separate condition everywhere where
`JobCondition.SUPERVISOR_UPDATED` is used except for OS updates.

It should also be safe to let the add-on store update, we simply
don't allow the add-on to be installed or updated if the OS is
outdated.

* Remove unnecessary Host info update

It seems that the CPE information are already loaded in the HostInfo
object. Remove the unnecessary update call.

* Fix pytest

* Delay refreshing of update data

Delay refreshing of update data until after setup phase. This allows to
use the JobCondition.OS_SUPPORTED safely. We still have to fetch the
updater data in case OS information is outdated. This typically happens
on device wipe.

Note also that plug-ins will automatically refresh updater data in case
it is missing the latest version information.

This will reverse the order of updates when there are new plug-in and
Supervisor update information available (e.g. on first startup):
Previously the updater data got refreshed before the plug-in started,
which caused them to update first. Then the Supervisor got update in
startup phase. Now the updater data gets refreshed in startup phase,
which then causes the Supervisor to update first before the plug-ins
get updated after Supervisor restart.

* Fix pytest

* Fix updater tests

* Add new tests to verify that updater reload is skipped

* Fix pylint

* Apply suggestions from code review

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Add debug message when we delay version fetch

---------

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
2025-08-22 11:09:56 +02:00
Stefan Agner
8a95113ebd Improve VLAN configuration (#6094)
* Fix NetworkManager connection name for VLANs

The connection name for VLANs should include the parent interface name
for better identification. This was originally the intention, but the
interface object's name property was used which appears empty at that
point.

* Disallow creating multiple connections for the same VLAN id

Only allow a single connection per interface and VLAN id. The regular
network commands can be used to alter the configuration.

* Fix pytest

* Simply connection id name generation

Always rely on the Supervisor interface representation's name attribute
to generate the NetworkManager connection id. Make sure that the name
is correctly set when creating VLAN interfaces as well.

* Special case VLAN configuration

We can't use the match information when comparing Supervisor interface
representation with D-Bus representations. Special case VLAN and
compare using VLAN ID and parent interface.

Note that this currently compares connection UUID of the parent
interface.

* Fix pytest

* Separate VLAN creation logic from apply_changes

Apply changes is really all about updating the NetworkManager settings
of a particular network interface. The base in apply_changes() is
NetworkInterface class, which is the NetworkManager Device abstraction.
All physical interfaces have such a Device hence it is always present.

The only exception is when creating a VLAN: Since it is a virtual
device, there is no device when creating a VLAN.

This separate the two cases. This makes it much easier to reason if
a VLAN already exists or not, and to handle the case where a VLAN
needs to be created.

For all other network interfaces, the apply_changes() method can
now rely on the presence of the NetworkInterface Device abstraction.

* Add VLAN test interface and VLAN exists test

Add a test which checks that an error gets raised when a VLAN for a
particular interface/id combination already exists.

* Address pylint

* Fix test_ignore_veth_only_changes pytest

* Make VLAN interface disabled to avoid test issues

* Reference setting 38 in mocked connection

* Make sure interface type matches

Require a interface type match before doing any comparision.

* Add Supervisor host network configuration tests

* Fix device type checking

* Fix pytest

* Fix tests by taking VLAN interface into account

* Fix test_load_with_network_connection_issues

This seems like a hack, but it turns out that the additional active
connection caused coresys.host.network.update() to be called, which
implicitly "fake" activated the connection. Now it seems that our
mocking causes IPv4 gateway to be set.

So in a way, the test checked a particular mock behavior instead of
actual intention.

The crucial part of this test is that we make sure the settings remain
unchanged. This is done by ensuring that the the method is still auto.

* Fix test_check_network_interface_ipv4.py

Now that we have the VLAN interface active too it will raise an issue
as well.

* Apply suggestions from code review

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Fix ruff check issue

---------

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
2025-08-22 11:09:39 +02:00
Mike Degatano
207b665e1d Send progress updates during image pull for install/update (#6102)
* Send progress updates during image pull for install/update

* Add extra to tests about job APIs

* Sent out of date progress to sentry and combine done event

* Pulling container image layer
2025-08-22 10:41:10 +02:00
Stefan Agner
1fb15772d7 Fix docker_config check for add-ons (#6119)
* Fix docker_config check to ignore Docker VOLUME mounts

Only validate /media and /share mounts that are explicitly configured
in add-on map_volumes, not those created by Docker VOLUME statements.

* Check and test with custom map targets
2025-08-22 10:38:41 +02:00
Stefan Agner
d95ca401ec Fix git path missing or empty (#6116)
* Optimize directory_missing_or_empty function

Replace inefficient os.listdir() with os.scandir() and next() to check
if directory is empty. This avoids reading entire directory contents
into memory when we only need to know if any entry exists.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Add unit tests for directory_missing_or_empty function

Add comprehensive test coverage for the optimized directory_missing_or_empty
function, testing empty directories, directories with content, non-existent
paths, and files (non-directories).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Apply suggestions from code review

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
2025-08-20 17:53:30 +02:00
dependabot[bot]
07d8fd006a Bump time-machine from 2.17.0 to 2.18.0 (#6113)
* Bump time-machine from 2.17.0 to 2.18.0

Bumps [time-machine](https://github.com/adamchainz/time-machine) from 2.17.0 to 2.18.0.
- [Changelog](https://github.com/adamchainz/time-machine/blob/main/docs/changelog.rst)
- [Commits](https://github.com/adamchainz/time-machine/compare/2.17.0...2.18.0)

---
updated-dependencies:
- dependency-name: time-machine
  dependency-version: 2.18.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Fix time_machine usage

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
2025-08-20 10:37:29 +02:00
Jan Čermák
b49ce96df8 Propagate timezone setting to host in OS 16.2 and newer (#6099)
* Propagate timezone setting to host in OS 16.2 and newer

With home-assistant/operating-system#4224, timezone setting in OS can be
peristently set in HAOS as well. Propagate the timezone configured in
Supervisor config (which can be changed through general system settings
in HA Core) through the DBus API for setting the timezone.

* Persist timezone also when it's been obtained from Whoami

* Suppress pylint fixme error
2025-08-20 01:30:57 +02:00
Copilot
014082eda8 Create repair issue when user has deprecated add-on installed (#6110)
* Initial plan

* Add deprecated addon repair issue check

Co-authored-by: agners <34061+agners@users.noreply.github.com>

* Apply ruff format

* Add remove suggestion

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

* Update tests/resolution/check/test_check_deprecated_addon.py

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: agners <34061+agners@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
2025-08-19 22:02:36 +02:00
Petar Petrov
2324b70084 Storage space usage API (#6046)
* Storage space usage API

* Move to host API

* add tests

* fix test url

* more tests

* fix tests

* fix test

* PR comments

* update test

* tweak format and url

* add .DS_Store to .gitignore

* update tests

* test coverage

* update to new struct

* update test
2025-08-19 10:54:53 +02:00
Mike Degatano
8a82b98e5b Improved error handling for docker image pulls (#6095)
* Improved error handling for docker image pulls

* Fix mocking in tests due to api use change
2025-08-13 18:05:27 +02:00
Copilot
fd205ce2ef Add Docker MTU configuration support for networks with non-standard MTU (#6079)
* Initial plan

* Implement Docker MTU support - core functionality

Co-authored-by: agners <34061+agners@users.noreply.github.com>

* Add comprehensive MTU tests and documentation

Co-authored-by: agners <34061+agners@users.noreply.github.com>

* Fix final linting issue in test file

Co-authored-by: agners <34061+agners@users.noreply.github.com>

* Apply suggestions from code review

* Implement reboot_required flag pattern and fix MyPy typing issue

Co-authored-by: agners <34061+agners@users.noreply.github.com>

* Update supervisor/api/docker.py

* Update supervisor/docker/manager.py

Co-authored-by: Mike Degatano <michael.degatano@gmail.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: agners <34061+agners@users.noreply.github.com>
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
2025-08-12 09:19:12 +02:00
Stefan Agner
cad14bf46e Handle disks with non-existing SMART attributes (#6077)
Not all disks have all SMART attributes available, e.g. Sentry showed
devices with missing "wctemp". In practice, any SMART attribute could
be missing. Make sure we handle this gracefully.
2025-08-07 09:40:03 +02:00
Stefan Agner
3b093200e3 Improve JobGroup locking with external ownership tracking (#6074)
* Use context manager for Job concurrency control

* Allow to release lock outside of Job running context

* Improve JobGroup locking with external ownership tracking

Track lock ownership by job UUID instead of execution context. This
allows external lock release via job parameter.

* Fix acquire lock in nested Jobs

* Simplify nested lock tracking

* Simplify Job group lock acquisition logic

* Simplify by using helper methods

* Allow throttling with group concurrency

* Use Lock instead of Semaphore for job concurrency control

Use the same synchronization primitive (Lock) for job concurrency
control as used in job groups.

* Go back to lock ownership tracking with references

* Drop unused property `active_job_id`

* Drop unused property `can_acquire`

* Replace assert with cast
2025-08-07 00:14:58 +02:00
Mike Degatano
fdde95d849 Add an issue for disk lifetime >90% (#6069) 2025-08-06 10:40:48 +02:00
Mike Degatano
059b161f4f Use actual latest version of OS in os version check (#6067) 2025-08-05 22:23:23 +02:00
Stefan Agner
9bee58a8b1 Migrate to JobConcurrency and JobThrottle parameters (#6065) 2025-08-05 13:24:44 +02:00
Mike Degatano
8a1e6b0895 Add unsupported reason os_version and evaluation (#6041)
* Add unsupported reason os_version and evaluation

* Order enum and add tests

* Apply suggestions from code review

* Apply suggestions from code review

---------

Co-authored-by: Stefan Agner <stefan@agner.ch>
2025-08-05 13:23:42 +02:00
Stefan Agner
f150d1b287 Return optimistic life time estimate for eMMC storage (#6063)
This avoids that we display a 10% life time use for a brand new
eMMC storage. The values are estimates anyways, and there is a
separate value which represents life time completely used (100%).
2025-08-05 10:43:57 +02:00
Mike Degatano
033896480d Fix backup equal and add hash to objects with eq (#6059)
* Fix backup equal and add hash to objects with eq

* Add test for failed consolidate
2025-08-04 14:19:33 +02:00
Mike Degatano
22afa60f55 Get lifetime info for NVMe devices (#6056)
* Get lifetime info for NVMe devices

* Fix lint and test issues

* Update tests/dbus_service_mocks/udisks2_manager.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-08-04 13:53:56 +02:00
Stefan Agner
27b092aed0 Block OS updates when the system is unhealthy (#6053)
* Block OS updates when the system is unhealthy

In #6024 we mark a system as unhealthy when multiple OS installations
were found. The idea was to block OS updates in this case. However, it
turns out that the OS update job was not checking the system health
and thus allowed updates even when the system was marked as unhealthy.

This commit adds the `JobCondition.HEALTHY` condition to the OS update
job, ensuring that OS updates are only performed when the system is
healthy.

Users can force an OS update still by using
`ha jobs options --ignore-conditions healthy`.

* Add test for update of unhealthy system

---------

Co-authored-by: Jan Čermák <sairon@sairon.cz>
2025-07-31 11:23:57 +02:00
Stefan Agner
6871ea4b81 Split execution limit in concurrency and throttle parameters (#6013)
* Split execution limit in concurrency and throttle parameters

Currently the execution limit combines two ortogonal features: Limit
concurrency and throttle execution. This change separates the two
features, allowing for more flexible configuration of job execution.

Ultimately I want to get rid of the old limit parameter. But for ease
of review and migration, I'd like to do this in two steps: First
introduce the new parameters, and map the old limit parameters to the
new parameters. Then, in a second step, remove the old limit parameter
and migrate all users to the new concurrency and throttle parameters
as needed.

* Introduce common lock release method

* Fix THROTTLE_WAIT behavior

The concurrency QUEUE does not really QUEUE throttle limits.

* Add documentation for new concurrency/throttle Job options

* Handle group options for concurrency and throttle separately

* Fix GROUP_THROTTLE_WAIT concurrency setting

We need to use the QUEUE concurrency setting instead of GROUP_QUEUE
for the GROUP_THROTTLE_WAIT execution limit. Otherwise the
test_jobs_decorator.py::test_execution_limit_group_throttle_wait
test deadlocks.

The reason this deadlocks is because GROUP_QUEUE concurrency doesn't
really work because we only can release a group lock if the job is
actually running.

Or put differently, throttling isn't supported with GROUP_*
concurrency options.

* Prevent using any throttling with group concurrency

The group concurrency modes (reject and queue) are not compatible with
any throttling, since we currently can't unlock the group lock when
a job doesn't get started (which is the case when throttling is
applied).

* Fix commit in group rate limit

* Explain the deadlock issue with group locks in code

* Handle locking correctly on throttle limit exceptions

* Introduce pytest for new job decorator combinations
2025-07-30 22:12:14 +02:00
Stefan Agner
7dcf5ba631 Enable IPv6 for containers on new installations (#6029)
* Enable IPv6 by default for new installations

Enable IPv6 by default for new Supervisor installations. Let's also
make the `enable_ipv6` attribute nullable, so we can distinguish
between "not set" and "set to false".

* Add pytest

* Add log message that system restart is required for IPv6 changes

* Fix API pytest

* Create resolution center issue when reboot is required

* Order log after actual setter call
2025-07-29 15:59:03 +02:00
Stefan Agner
fbb0915ef8 Mark system as unhealthy if multiple OS installations are found (#6024)
* Add resolution check for duplicate OS installations

* Only create single issue/use separate unhealthy type

* Check MBR partition UUIDs as well

* Use partlabel

* Use generator to avoid code duplication

* Add list of devices, avoid unnecessary exception handling

* Run check only on HAOS

* Fix message formatting

* Fix and simplify pytests

* Fix UnhealthyReason sort order
2025-07-17 10:06:35 +02:00