* Add endpoint for complete logs of the latest container startup
Add endpoint that returns complete logs of the latest startup of
container, which can be used for downloading Core logs in the frontend.
Realtime filtering header is used for the Journal API and StartedAt
parameter from the Docker API is used as the reference point. This means
that any other Range header is ignored for this parameter, yet the
"lines" query argument can be used to limit the number of lines. By
default "infinite" number of lines is returned.
Closes#6147
* Implement fallback for latest logs for OS older than 16.0
Implement fallback which uses the internal CONTAINER_LOG_EPOCH metadata
added to logs created by the Docker logger. Still prefer the time-based
method, as it has lower overhead and using public APIs.
* Address review comments
* Only use CONTAINER_LOG_EPOCH for latest logs
As pointed out in the review comments, we might not be able to get the
StartedAt for add-ons that are not running. Thus we need to use the only
reliable mechanism available now, which is the container log epoch.
* Remove dead code for 'Range: realtime' header handling
* Write cidfiles of Docker containers and mount them individually to /run/cid
There is no standard way to get the container ID in the container
itself, which can be needed for instance for #6006. The usual pattern is
to use the --cidfile argument of Docker CLI and mount the generated file
to the container. However, this is feature of Docker CLI and we can't
use it when creating the containers via API. To get container ID to
implement native logging in e.g. Core as well, we need the help of the
Supervisor.
This change implements similar feature fully in Supervisor's DockerAPI
class that orchestrates lifetime of all containers managed by
Supervisor. The files are created in the SUPERVISOR_DATA directory, as
it needs to be persisted between reboots, just as the instances of
Docker containers are.
Supervisor's cidfile must be created when starting the Supervisor
itself, for that see home-assistant/operating-system#4276.
* Address review comments, fix mounting of the cidfile
* Fix NetworkManager connection name for VLANs
The connection name for VLANs should include the parent interface name
for better identification. This was originally the intention, but the
interface object's name property was used which appears empty at that
point.
* Disallow creating multiple connections for the same VLAN id
Only allow a single connection per interface and VLAN id. The regular
network commands can be used to alter the configuration.
* Fix pytest
* Simply connection id name generation
Always rely on the Supervisor interface representation's name attribute
to generate the NetworkManager connection id. Make sure that the name
is correctly set when creating VLAN interfaces as well.
* Special case VLAN configuration
We can't use the match information when comparing Supervisor interface
representation with D-Bus representations. Special case VLAN and
compare using VLAN ID and parent interface.
Note that this currently compares connection UUID of the parent
interface.
* Fix pytest
* Separate VLAN creation logic from apply_changes
Apply changes is really all about updating the NetworkManager settings
of a particular network interface. The base in apply_changes() is
NetworkInterface class, which is the NetworkManager Device abstraction.
All physical interfaces have such a Device hence it is always present.
The only exception is when creating a VLAN: Since it is a virtual
device, there is no device when creating a VLAN.
This separate the two cases. This makes it much easier to reason if
a VLAN already exists or not, and to handle the case where a VLAN
needs to be created.
For all other network interfaces, the apply_changes() method can
now rely on the presence of the NetworkInterface Device abstraction.
* Add VLAN test interface and VLAN exists test
Add a test which checks that an error gets raised when a VLAN for a
particular interface/id combination already exists.
* Address pylint
* Fix test_ignore_veth_only_changes pytest
* Make VLAN interface disabled to avoid test issues
* Reference setting 38 in mocked connection
* Make sure interface type matches
Require a interface type match before doing any comparision.
* Add Supervisor host network configuration tests
* Fix device type checking
* Fix pytest
* Fix tests by taking VLAN interface into account
* Fix test_load_with_network_connection_issues
This seems like a hack, but it turns out that the additional active
connection caused coresys.host.network.update() to be called, which
implicitly "fake" activated the connection. Now it seems that our
mocking causes IPv4 gateway to be set.
So in a way, the test checked a particular mock behavior instead of
actual intention.
The crucial part of this test is that we make sure the settings remain
unchanged. This is done by ensuring that the the method is still auto.
* Fix test_check_network_interface_ipv4.py
Now that we have the VLAN interface active too it will raise an issue
as well.
* Apply suggestions from code review
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Fix ruff check issue
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Rename repository fixture to test_repository
Also don't remove the built-in repositories. The list was incomplete,
and tests don't seem to require that anymore.
* Get rid of StoreType
The type doesn't have much value, we have constant strings anyways.
* Introduce types.py
* Use slug to determine which repository urls to return
* Simplify BuiltinRepository enum
* Mock GitRepo load
* Improve URL handling and repository creation logic
* Refactor update_repositories
* Get rid of get_from_url
It is no longer used in production code.
* More refactoring
* Address pylint
* Introduce is_git_based property to Repository class
Return all git based URLs, including the Core repository.
* Revert "Introduce is_git_based property to Repository class"
This reverts commit dfd5ad79bf.
* Fold type.py into const.py
Align more with how Supervisor code is typically structured.
* Update supervisor/store/__init__.py
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Apply repository remove suggestion
* Fix tests
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Improve DNS plug-in restart
Instead of simply go by PrimaryConnectioon change, use the DnsManager
Configuration property. This property is ultimately used to write the
DNS plug-in configuration, so it is really the relevant information
we pass on to the plug-in.
* Check for changes and restart DNS plugin
* Check for changes in plug-in DNS
Cache last local (NetworkManager) provided DNS servers. Check against
this DNS server list when deciding when to restart the DNS plug-in.
* Check connectivity unthrottled in certain situations
* Fix pytest
* Fix pytest
* Improve test coverage for DNS plugins restart functionality
* Apply suggestions from code review
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Debounce local DNS changes and event based connectivity checks
* Remove connection check logic
* Remove unthrottled connectivity check
* Fix delayed call
* Store restart task and cancel in case a restart is running
* Improve DNS configuration change tests
* Remove stale code
* Improve DNS plug-in tests, less mocking
* Cover multiple private functions at once
Improve tests around notify_locals_changed() to cover multiple
functions at once.
---------
Co-authored-by: Mike Degatano <michael.degatano@gmail.com>
* Use Docker BuildKit to build addons
* Improve error message as suggested by CodeRabbit
* Fix container.remove() tests missing v=True
* Ignore squash rather than falling back to legacy builder
* Use version rather than tag to avoid confusion in run_command()
* Fix tests differently
* Use PropertyMock like other tests
* Restore position of fix_label fn
* Exempt addon builder image from unsupported checks
* Refactor tests
* Fix tests expecting wrong builder image
* Remove harcoded paths
* Fix tests
* Remove get_addon_host_path() function
* Use docker buildx build rather than docker build
Co-authored-by: Stefan Agner <stefan@agner.ch>
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Fix mypy issues in store module
* Fix mypy issues in utils module
* Fix mypy issues in all remaining source files
* Fix ingress user typeddict
* Fixes from feedback
* Fix mypy issues after installing docker-types
This reverts commit 63fde3b410.
This change introduced another more severe regression, causing all
add-ons that haven't been started since Supervisor startup to cause
errors during their backup. More sophisticated check would have to be
implemented to address edge cases during backups for non-existing
add-ons (or their config actually).
Fixes#5924
* Use journal-gatewayd's new /boots endpoint to list boots
Current method we use for getting boots has several known downsides, for
example it can miss some incomplete boots and the performance might be
worse than what we could get by using Systemd directly. Systemd was
missing a method to get list boots through the journal-gatewayd but that
should be addressed by the new /boots endpoint added in [1] which
returns application/json-seq response containing all boots as reported
in `journalctl --list-boots`.
Implement Supervisor methods to parse this format and use the endpoint
at first, falling back to the old method if it fails.
[1] https://github.com/systemd/systemd/pull/37574
* Log info instead of warning when /boots is not present
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Split records only by RS instead of LF in journal_boots_reader
* Strip only RS, json.loads is fine with whitespace
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Recreate aiohttp ClientSession after DNS plug-in load
Create a temporary ClientSession early in case we need to load version
information from the internet. This doesn't use the final DNS setup
and hence might fail to load in certain situations since we don't have
the fallback mechanims in place yet. But if the DNS container image
is present, we'll continue the setup and load the DNS plug-in. We then
can recreate the ClientSession such that it uses the DNS plug-in.
This works around an issue with aiodns, which today doesn't reload
`resolv.conf` automatically when it changes. This lead to Supervisor
using the initial `resolv.conf` as created by Docker. It meant that
we did not use the DNS plug-in (and its fallback capabilities) in
Supervisor. Also it meant that changes to the DNS setup at runtime
did not propagate to the aiohttp ClientSession (as observed in #5332).
* Mock aiohttp.ClientSession for all tests
Currently in several places pytest actually uses the aiohttp
ClientSession and reaches out to the internet. This is not ideal
for unit tests and should be avoided.
This creates several new fixtures to aid this effort: The `websession`
fixture simply returns a mocked aiohttp.ClientSession, which can be
used whenever a function is tested which needs the global websession.
A separate new fixture to mock the connectivity check named
`supervisor_internet` since this is often used through the Job
decorator which require INTERNET_SYSTEM.
And the `mock_update_data` uses the already existing update json
test data from the fixture directory instead of loading the data
from the internet.
* Log ClientSession nameserver information
When recreating the aiohttp ClientSession, log information what
nameservers exactly are going to be used.
* Refuse ClientSession initialization when API is available
Previous attempts to reinitialize the ClientSession have shown
use of the ClientSession after it was closed due to API requets
being handled in parallel to the reinitialization (see #5851).
Make sure this is not possible by refusing to reinitialize the
ClientSession when the API is available.
* Fix pytests
Also sure we don't create aiohttp ClientSession objects unnecessarily.
* Apply suggestions from code review
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
---------
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Add API for swap configuration
Add HTTP API for swap size and swappiness to /os/config/swap. Individual
options can be set in JSON and are calling the DBus API added in OS
Agent 1.7.x, available since OS 15.0. Check for presence of OS of the
required version and return 404 if the criteria are not met.
* Fix type hints and reboot_required logic
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Fix formatting after adding suggestions from GH
* Address @mdegat01 review comments
- Improve swap options validation
- Add swap to the 'all' property of dbus agent
- Use APINotFound with reason instead of HTTPNotFound
- Reorder API routes
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Finish out effort of adding and enabling blockbuster
* Skip getting addon file size until securetar fixed
* Fix test for devcontainer and blocking I/O
* Fix docker fixture and load_config to post_init
* Add blockbuster library and find I/O from unit tests
* Fix lint and test issue
* Fixes from feedback
* Avoid modifying webapp object in executor
* Split su options validation and only validate timezone on change
* Replace non-unicode characters for add-on static files
Add-on documentation and changelog get read and returned as text file.
However, in case the original author used non-unicode characters, or
the file corrupted, loading currently fails with an UnicodeDecodeError.
Let's just use the built-in replace error handling of Python, so they
appear for the user as non-unicode characters by replacing them with
the official unicode replacement character "�".
* Remove superflous parameter for binary files
* ruff format
* Add pytests
Since #5696 we don't need to load the resolution center early. In fact,
with #5686 this is even problematic for pytests in devcontainer, since
the Supervisor Core state is valid and this causes AppArmor evaluations
to run (and fail).
Actually, #5696 removed the resolution center. #5686 brought it
accidentally back. This was seemingly a merge error.
* Initialize Supervisor Core state in constructor
Make sure the Supervisor Core state is set to a value early on. This
makes sure that the state is always of type CoreState, and makes sure
that any use of the state can rely on it being an actual value from the
CoreState enum.
This fixes Sentry filter during early startup, where the state
previously was None. Because of that, the Sentry filter tried to
collect more Context, which lead to an exception and not reporting
errors.
* Fix pytest
It seems that with initializing the state early, the pytest actually
runs a system evaluation with:
Starting system evaluation with state initialize
Before it did that with:
Starting system evaluation with state None
It detects that the container runs as privileged, and declares the
system as unhealthy.
It is unclear to me why coresys.core.healthy was checked in this
context, it doesn't seem useful. Just remove the check, and validate
the state through the getter instead.
* Update supervisor/core.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Make sure Supervisor container is privileged in pytest
With the Supervisor Core state being valid now, some evaluations
now actually run when loading the resolution center. This leads to
Supervisor getting declared unhealthy due to not running in a privileged
container under pytest.
Fake the host container to be privileged to make evaluations not
causing the system to be declared unhealthy under pytest.
* Avoid writing actual Supervisor run state file
With the Supervisor Core state being valid from the very start, we end
up writing a state everytime.
Instead of actually writing a state file, simply validate the the
necessary calls are being made. This is more conform to typical unit
tests and avoids writing a file for every test.
* Extend WebSocket client fixture and use it consistently
Extend the ha_ws_client WebSocket client fixture to set Supervisor Core
into run state and clear all pending messages.
Currently only some tests use the ha_ws_client WebSocket client fixture.
Use it consistently for all tests.
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Load resolution evaluation, check and fixups early
Before #5652, these modules were loaded in the constructor, hence early
in `initialize_coresys()`. Moving them late actually exposed an issue
where NetworkManager connectivity setter couldn't get the
`connectivity_check` evaluation, leading to an exception early in
bootstrap.
Technically, it might be safe to load the resolution modules only in
`Core.connect()`, however then we'd have to load them separately for
pytest. Let's go conservative and load them the same place where they
got loaded before #5652.
* Load resolution modules in a single executor call
* Fix pytest
* Avoid IO in event loop when removing backup
* Refactor backup size calculation
Currently size is lazy loaded when required via properties. This
however is blocking the async event loop.
Backup sizes don't change. Instead of lazy loading the size of a backup
simply determine it on loading/after creation.
* Fix tests for backup size change
* Avoid IO in event loop when loading backups
* Avoid IO in event loop when importing a backup
* Delay inital version fetch until there is connectivity
* Add test
* Only mock get not whole websession object
* drive delayed fetch off of supervisor connectivity not host
* Fix test to not rely on sleep guessing to track tasks
* Use fixture to remove job throttle temporarily
* Bump Supervisor to Python 3.13
* Update ruff configuration to 0.9.1
Adjust pyproject.toml for ruff 0.9.1. Also make sure that latest version
of ruff is used in pre-commit.
* Set default configuration for pytest-asyncio
* Run ruff check
* Drop deprecated decorator no_type_check_decorator
The upstream PR (https://github.com/python/cpython/issues/106309) says
this never got really implemented by type checkers.
* Bump devcontainer to latest release
This PR minimizes the D-Bus requirements for tests. It does this by
using dbus-daemon directly instead of dbus-launch. The latter is meant
for graphical applications and therefor has X11 dependencies. It also
leaves the D-Bus daemon running after the tests are done. This will
accumulate dbus-daemon processes over time which is not ideal.
I've also considered using dbus-run-session since it is meant to launch
processes with a private D-Bus session. For Python tests one could
launch it like so:
dbus-run-session -- python3 -m pytest ...
Then `DBUS_SESSION_BUS_ADDRESS` would be used automatically by the
`MessageBus` class. However, to keep the current behavior of the tests,
launching the D-Bus daemon manually is the better option.
* Use signals to recognize new disks immediately
* Add test for disabled data disk issue
* Add mock of UDisks2 base service to test
* Apply suggestions from code review
* Shutdown manager first to avoid potential race conditions
* Update tests/dbus_service_mocks/udisks2.py
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
* Allow adoption of existing data disk
* Fix existing tests
* Add test cases and fix image issues
* Fix addon build test
* Run checks during setup not startup
* Addon load mimics plugin and HA load for docker part
* Default image accessible in except
* Use Journal Export Format for host (advanced) logs
Add methods for handling Journal Export Format and use it for fetching
of host logs. This is foundation for colored streaming logs for other
endpoints as well.
* Make pylint happier - remove extra pass statement
* Rewrite journal gateway tests to mock ClientResponse's StreamReader
* Handle connection refused error when connecting to journal-gatewayd
* Use SYSTEMD_JOURNAL_GATEWAYD_SOCKET global path also for connection
* Use parsing algorithm suggested by @agners in review
* Fix timestamps in formatting, always use UTC for now
* Add tests for Accept header in host logs
* Apply suggestions from @agners
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Bail out of parsing earlier if field is not in required fields
* Fix parsing issue discovered in the wild and add test case
* Make verbose formatter more tolerant
* Use some bytes' native functions for some minor optimizations
* Move MalformedBinaryEntryError to exceptions module, add test for it
---------
Co-authored-by: Stefan Agner <stefan@agner.ch>
* Migrate to Ruff for lint and format
* Fix pylint issues
* DBus property sets into normal awaitable methods
* Fix tests relying on separate tasks in connect
* Fixes from feedback
* Add background option to backup APIs
* Fix decorator tests
* Working error handling, initial test cases
* Change to schedule_job and always return job id
* Add tests
* Reorder call at/later args
* Validation errors return immediately in background
* None is invalid option for background
* Must pop the background option from body
* Don't check if Core is running to trigger rollback
Currently we check for Core API access and that the state is running. If
this is not fulfilled within 5 minutes, we rollback to the previous
version.
It can take quite a while until Home Assistant Core is in state running.
In fact, after going through bootstrap, it can theoretically take
indefinitely (as in there is no timeout from Core side).
So to trigger rollback, rather than check the state to be running, just
check if the API is accessible in this case. This prevents spurious
rollbacks.
* Check Core status with and timeout after a longer time
Instead of checking the Core API just for response, do check the
state. Use a timeout which is long enough to cover all stages and
other timeouts during Core startup.
* Introduce get_api_state and better status messages
* Update supervisor/homeassistant/api.py
Co-authored-by: J. Nick Koston <nick@koston.org>
* Add successful start test
---------
Co-authored-by: J. Nick Koston <nick@koston.org>
* Backup and restore track progress in job
* Change to stage only updates and fix tests
* Leave HA alone if it wasn't restored
* skip check HA stage message when we don't check
* Change to helper to get current job
* Fix tests
* Mark jobs as internal to skip notifying HA
* Add job group execution limit option
* Fix pylint issues
* Assign variable before usage
* Cleanup jobs when done
* Remove isinstance check for performance
* Explicitly raise from None
* Add some more documentation info
* Reduce executor code for docker
* Fix pylint errors and move import/export image
* Fix test and a couple other risky executor calls
* Fix dataclass and return
* Fix test case and add one for corrupt docker
* Add some coverage
* Undo changes to docker manager startup