* Improve UX of HA CLI wrapper and emergency console
For many users, the emergency console gives feeling that the system is
completely broken. However, there are various cases when the system just takes
just a bit longer to start up and the emergency message is shown, while it
finishes a proper startup shortly after. This change tries to improve the UX in
several ways:
* The limit before a forced emergency console startup is changed to 3 minutes
* Waiting can be interrupted with Ctrl+C (reset counter is cleared then)
* Some hints what to check have been added before starting the shell
* Also, because if the HA CLI failed for 5 times in a row in quick succession,
the CLI startup was then not retried anymore and user may have been left with
a black screen, the restart limits timeouts have been adjusted only to back
off and never mark the unit as failed
Closes#4273
* Use /bin/sh and printf to silence linter errors
RaspberryMatic was renamed to OpenCCU in
https://github.com/OpenCCU/OpenCCU/pull/3162. This caused change of the name of
the directory in the source tarball, causing build failure when the archive
wasn't cached.
The extra information printed when using the top-level makefile can clutter the
output when it needs to be further processed, e.g. when running
`make show-info | jq`. Make it respect the --silent flag (which also suppresses
messages about changing directories which would break parsing as well).
Use the --cidfile Docker CLI argument when starting the container and
bind-mount the generated file containing full ID of the container to the
container itself.
Using --mount instead of --volume is needed, as --volume is racy and creates
empty directory volume at the destination path instead.
This is prerequisite for home-assistant/supervisor#6006 but can come handy for
other cases too.
Upstream commit [1] caused regression in IPv4 routing which can cause some
routes becoming broadcast even though they should be routed as unicast, e.g.:
# ip route get 1.1.1.1
broadcast 1.1.1.1 via 192.168.122.1 dev enp0s3 src 192.168.122.204 uid 0
cache <local,brd>
It's not entirely clear yet why it happens but this behavior seems to be
triggered for instance when the SSDP integration sends the broadcast packet on
HA startup. While this behavior is not described in the regression report [1],
the commit cherry-picked from Linux master fixes the problems for us as well.
Patches moved to version-specific folder, as this one shouldn't be applied on
Raspberry Pi targets.
[1] https://lore.kernel.org/all/20250710142714.12986-1-oscmaes92@gmail.com/
[2] https://lore.kernel.org/stable/20250822165231.4353-4-bacs@librecast.net/Fixes#4265
(cherry picked from commit 78bda4bd10)
Upstream commit [1] caused regression in IPv4 routing which can cause some
routes becoming broadcast even though they should be routed as unicast, e.g.:
# ip route get 1.1.1.1
broadcast 1.1.1.1 via 192.168.122.1 dev enp0s3 src 192.168.122.204 uid 0
cache <local,brd>
It's not entirely clear yet why it happens but this behavior seems to be
triggered for instance when the SSDP integration sends the broadcast packet on
HA startup. While this behavior is not described in the regression report [1],
the commit cherry-picked from Linux master fixes the problems for us as well.
Patches moved to version-specific folder, as this one shouldn't be applied on
Raspberry Pi targets.
[1] https://lore.kernel.org/all/20250710142714.12986-1-oscmaes92@gmail.com/
[2] https://lore.kernel.org/stable/20250822165231.4353-4-bacs@librecast.net/Fixes#4265
Revert patch added to 6.12.43 which breaks the build of CAN_TI_HECC module
present in Tinker config. While it's quite unlikely anyone would be using it,
so we could just simply disable the module, this seems to be a better solution.
This reverts commit 22fe9b19ee.
There are major issues when OS has no internet connectivity - in such case the
script doesn't go the expected happy path after the rework and eventually
removes the Docker image, essentially bricking offline installations.
Since there is no immediate benefit for HAOS and such change turns out to be
high risk considering the planned release, leave it to be implemented later.
Update the buildroot submodule to include:
- BlueZ 5.83 (from 5.79)
- Patch to fix device removal on LE connection abort (upstream PR #1521)
This fixes Bluetooth stability issues where devices get removed from D-Bus
during connection retries, preventing reconnection attempts.
Fixes: https://github.com/bluez/bluez/issues/1489
This knob controls whether Linux throws away its congestion
window (cwnd) after a connection has been idle for at least one
retransmission timeout (RTO). With a value of 0, Linux keeps the
cwnd it had before the idle period and can send that amount
immediately when the application resumes writing (still bounded
by the receiver's advertised window and by pacing).
With slow start after idle enabled (the default), Linux allows
only about 10 MSS (~14 KiB) in the first burst after idle. Even
when a connection stays open to web clients, a short idle forces
multiple round trips to ramp back up.
On Wi-Fi, local connections often have very low RTTs, which drives
the RTO down. Between page navigations the connection is considered
idle by Linux. If the next request happens during a transient
latency spike on the Wi-Fi link, the sender starts with a tiny
cwnd and must grow it over many RTTs, so the spike causes outsized
and visible loading delays.
For devices behind typical Internet uplinks, the higher RTT makes
the initial ramp-up feel even slower until the window regains size.
However, here the connection does take longer to drop to idle, for
Linux standards. So the connection is less likely to be considered
idle between navigations.
This change does not affect flows with very small receive windows
(e.g. many microcontrollers), which are limited by the peer's
advertised window rather than the sender's cwnd.
Example RTOs on low jitter, low loss connections:
Defaults:
TCP_RTO_MIN = 200 ms
TCP_RTO_MAX = 120 s
low-jitter path so rttvar_us = 200 ms
HZ = 1000 or 250 or 100 (depending on the kernel settings)
*31 ms average RTT*
- SRTT ≈ 31 ms; RTTVAR ≈ 200 ms → Sum = 231 ms
- 'usecs_to_jiffies(231000)' = 231 jiffies (HZ 1000) -> RTO ≈ 231 ms
- If 'HZ = 250' (4 ms tick), ceil(231/4)=58 jiffies -> 232 ms RTO
- If 'HZ = 100' (10 ms tick), ceil(231/10)=23 jiffies -> 240 ms RTO
*178 ms average RTT*
- HZ=1000 (1 ms tick): 378 ms RTO
- HZ=250 (4 ms tick): ceil(378/4)=95 -> 380 ms RTO
- HZ=100 (10 ms tick): ceil(378/10)=38 -> 380 ms RTO
*292 ms average RTT*
- HZ=1000 (1 ms tick): 492 ms RTO
- HZ=250 (4 ms tick): ceil(492/4)=123 -> 492 ms RTO
- HZ=100 (10 ms tick): ceil(492/10)=50 -> 500 ms RTO
Any loss or jitter will increase those RTO values.
Set net.ipv4.tcp_thin_linear_timeouts=1 to switch retransmission
timeout (RTO) backoff from exponential to linear for 'thin' TCP flows.
This reduces tail latency for API-style connections that typically have
very few packets in flight, improving recovery from sporadic loss without
changing anything for larger TCP transfers.
Kernel definition: A flow is considered thin when 'tp->packets_out < 4'
and while not in the initial slow start.
See tcp_stream_is_thin(tp) in include/net/tcp.h.
Increase the BlueZ temporary device timeout from the default 30s to 195s.
This prevents devices from being removed from D-Bus during connection
retries, especially when multiple connection attempts are queued.
The 195s timeout aligns with Home Assistant's Bluetooth stack behavior
for ESPHome proxies and prevents the 'device removal spiral' that occurs
when devices timeout during sequential connection attempts.
Add list of hassio components from version.json that are built-in in the data
partition to the GH step summary. For landingpage, get the latest stable
release at the time of the build, as it's what should be published as
homeassistant:landingpage by that time.
Closes#4242
Currently when we run a build with limited set of boards that doesn't include
OVA, the test job fails because the OVA artifact is missing. Add a checkbox for
running tests and ensure that OVA artifact is built if it's enabled.
* Fix scripts/enter.sh so it is usable on macOS
Also, stop requiring `sudo` for actions that do not need it
Tested by building generic_x86_64 target on a macOS machine
Signed-off-by: Marat Radchenko <marat@slonopotamus.org>
* Update scripts/enter.sh
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
---------
Signed-off-by: Marat Radchenko <marat@slonopotamus.org>
Co-authored-by: Jan Čermák <sairon@users.noreply.github.com>
Before update to Buildroot 2025.02, the overlays directory on Yellow was
created by rpi-firmware in a condition added confusingly in firmware bump [1].
However, this got lost during Buildroot update, and since Yellow doesn't copy
overlays from the rpi-firmware repo, the directory was never created and the
rpi-rf-mod.dtbo couldn't be copied there in pre-image build hook.
To make things more robust, create the overlays directory for rpi targets
conditionally in the hook instead of relying on rpi-firmware to create it.
[1] f1af1a0bf7Fixes#4233
* Enable publishing of dev builds to R2 without bumping version
We currently can only use Github artifacts for on-demand builds from feature
branches. However, downloading of these requires authentication and it's tricky
to update a device if we need feedback from user testing. On the other hand, we
never want to publish to the dev channel from anything else than from the dev
branch. Restrict version bump to builds from release channels or from the dev
branch only.
* Use YYYYMMDD dev suffix only for published dev branch
For feature builds, or for builds that should not be published, use timestamp
suffix instead of YYYYMMDD. That way feature builds won't collide with dev
releases.