Persist currently selected slot on successful boot (#5276)

When boot slot is selected manually in GRUB, the system boots into this
slot and marks it as good. However, the boot order is not changed, so in
the next boot (after an explicit or unexpected reboot) HAOS returns to
the version in the other slot. This might be confusing because if the
system has been running for some time, the user can forget they have
changed the boot slot to fix issue they had.

This gets more confusing if the "other" boot slot is selected manually
three times in a row. Let's say we have ORDER="A B". This means that
every time GRUB starts, it wants to boot slot A. If the slot B is
selected instead, only A_TRY is incremented, system boots into slot B
and marks slot B as good (B_OK=1, B_TRY=0). On another boot, this
repeats, yet A_TRY is incremented again. Until it reaches 3, the slot A
would be always chosen automatically, only after that it would boot to
slot B, presuming slot A is dead. The ORDER variable will be still
unchanged though.

This commit only makes sure that when the system is marked as healthy,
the slot is both marked as good AND active, updating the ORDER variable
as well. Because the X_TRY counter is incremented by GRUB, if we want
the other slot not to be marked as bad, we need to adjust the logic in
OS's grub.cfg as well, because Supervisor can't know whether it's
apppropriate to change other slot's state or not.

I also took the courtesy to adjust the logging a bit, to include the
stack trace in the error log if marking the slot fails somehow.
This commit is contained in:
Jan Čermák 2024-08-30 15:18:44 +02:00 committed by GitHub
parent 12f8ccdf02
commit 08f10c96ef
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -366,11 +366,19 @@ class OSManager(CoreSysAttributes):
async def mark_healthy(self) -> None:
"""Set booted partition as good for rauc."""
try:
response = await self.sys_dbus.rauc.mark(RaucState.GOOD, "booted")
responses = [
await self.sys_dbus.rauc.mark(RaucState.ACTIVE, "booted"),
await self.sys_dbus.rauc.mark(RaucState.GOOD, "booted"),
]
except DBusError:
_LOGGER.error("Can't mark booted partition as healthy!")
_LOGGER.exception("Can't mark booted partition as healthy!")
else:
_LOGGER.info("Rauc: %s - %s", self.sys_dbus.rauc.boot_slot, response[1])
_LOGGER.info(
"Rauc: slot %s - %s, %s",
self.sys_dbus.rauc.boot_slot,
responses[0][1],
responses[1][1],
)
await self.reload()
@Job(