Support containerd snapshotter progress reporting

Fix docker image pull progress blocked by small layers
Small Docker layers (typically <100 bytes) can skip the downloading phase entirely, going directly from "Pulling fs layer" to "Download complete" without emitting any progress events with byte counts. This caused the aggregate progress calculation to block indefinitely, as it required all layer jobs to have their `extra` field populated with byte counts before proceeding. The issue manifested as parent job progress jumping from 0% to 97.9% after long delays, as seen when a 96-byte layer held up progress reporting for ~50 seconds until it finally reached the "Extracting" phase. Set a minimal `extra` field (current=1, total=1) when layers reach "Download complete" without having gone through the downloading phase. This allows the aggregate progress calculation to proceed immediately while still correctly representing the layer as 100% downloaded. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-06 09:29:31 +00:00 · 2025-11-05 15:09:17 +01:00 · 2025-11-05 14:40:16 +01:00 · 2025-11-05 12:17:51 +01:00 · 2025-11-04 17:12:15 -05:00
6 changed files with 204 additions and 12 deletions
--- a/.github/workflows/sentry.yaml
+++ b/.github/workflows/sentry.yaml
@@ -12,7 +12,7 @@ jobs:
      - name: Check out code from GitHub
        uses: actions/checkout@08c6903cd8c0fde910a37f88322edcfb5dd907a8 # v5.0.0
      - name: Sentry Release
-        uses: getsentry/action-release@4f502acc1df792390abe36f2dcb03612ef144818 # v3.3.0
+        uses: getsentry/action-release@128c5058bbbe93c8e02147fe0a9c713f166259a6 # v3.4.0
        env:
          SENTRY_AUTH_TOKEN: ${{ secrets.SENTRY_AUTH_TOKEN }}
          SENTRY_ORG: ${{ secrets.SENTRY_ORG }}
--- a/supervisor/api/utils.py
+++ b/supervisor/api/utils.py
@@ -151,7 +151,7 @@ def api_return_error(
        if check_exception_chain(error, DockerAPIError):
            message = format_message(message)
    if not message:
-        message = "Unknown error, see supervisor"
+        message = "Unknown error, see Supervisor logs (check with 'ha supervisor logs')"

    match error_type:
        case const.CONTENT_TYPE_TEXT:
--- a/supervisor/docker/interface.py
+++ b/supervisor/docker/interface.py
@@ -217,7 +217,7 @@ class DockerInterface(JobGroup, ABC):

        await self.sys_run_in_executor(self.sys_docker.docker.login, **credentials)

-    def _process_pull_image_log(
+    def _process_pull_image_log(  # noqa: C901
        self, install_job_id: str, reference: PullLogEntry
    ) -> None:
        """Process events fired from a docker while pulling an image, filtered to a given job id."""
@@ -318,13 +318,17 @@ class DockerInterface(JobGroup, ABC):
                },
            )
        else:
+            # If we reach DOWNLOAD_COMPLETE without ever having set extra (small layers that skip
+            # the downloading phase), set a minimal extra so aggregate progress calculation can proceed
+            extra = job.extra
+            if stage == PullImageLayerStage.DOWNLOAD_COMPLETE and not job.extra:
+                extra = {"current": 1, "total": 1}
+
            job.update(
                progress=progress,
                stage=stage.status,
                done=stage == PullImageLayerStage.PULL_COMPLETE,
-                extra=None
-                if stage == PullImageLayerStage.RETRYING_DOWNLOAD
-                else job.extra,
+                extra=None if stage == PullImageLayerStage.RETRYING_DOWNLOAD else extra,
            )

        # Once we have received a progress update for every child job, start to set status of the main one
@@ -337,23 +341,40 @@ class DockerInterface(JobGroup, ABC):
        ]

        # First set the total bytes to be downloaded/extracted on the main job
+        # Note: With containerd snapshotter, total may be None (time-based progress instead of byte-based)
        if not install_job.extra:
            total = 0
+            has_byte_progress = True
            for job in layer_jobs:
                if not job.extra:
                    return
+                # If any layer has None for total, we can't do byte-weighted aggregation
+                if job.extra["total"] is None:
+                    has_byte_progress = False
+                    break
                total += job.extra["total"]
-            install_job.extra = {"total": total}
+
+            # Store whether we have byte-based progress for later use
+            install_job.extra = {"total": total if has_byte_progress else None}
        else:
            total = install_job.extra["total"]
+            has_byte_progress = total is not None

-        # Then determine total progress based on progress of each sub-job, factoring in size of each compared to total
+        # Then determine total progress based on progress of each sub-job
+        # If we have byte counts, weight by size. Otherwise, simple average.
        progress = 0.0
        stage = PullImageLayerStage.PULL_COMPLETE
        for job in layer_jobs:
            if not job.extra:
                return
-            progress += job.progress * (job.extra["total"] / total)
+
+            if has_byte_progress:
+                # Byte-weighted progress (classic Docker behavior)
+                progress += job.progress * (job.extra["total"] / total)
+            else:
+                # Simple average progress (containerd snapshotter with time-based progress)
+                progress += job.progress / len(layer_jobs)
+
            job_stage = PullImageLayerStage.from_status(cast(str, job.stage))

            if job_stage < PullImageLayerStage.EXTRACTING:
--- a/supervisor/jobs/init.py
+++ b/supervisor/jobs/init.py
@@ -98,7 +98,9 @@ class SupervisorJobError:
    """Representation of an error occurring during a supervisor job."""

    type_: type[HassioError] = HassioError
-    message: str = "Unknown error, see supervisor logs"
+    message: str = (
+        "Unknown error, see Supervisor logs (check with 'ha supervisor logs')"
+    )
    stage: str | None = None

    def as_dict(self) -> dict[str, str | None]:
--- a/tests/docker/test_interface.py
+++ b/tests/docker/test_interface.py
@@ -569,3 +569,172 @@ async def test_install_progress_handles_download_restart(
        await event.wait()

    capture_exception.assert_not_called()
+
+
+async def test_install_progress_handles_layers_skipping_download(
+    coresys: CoreSys,
+    test_docker_interface: DockerInterface,
+    capture_exception: Mock,
+):
+    """Test install handles small layers that skip downloading phase and go directly to download complete."""
+    coresys.core.set_state(CoreState.RUNNING)
+    # Simulate multiple layers where one small layer (96 bytes) skips the downloading phase
+    # This layer should not block progress reporting for the parent job
+    coresys.docker.docker.api.pull.return_value = [
+        {"status": "Pulling from test/image", "id": "latest"},
+        # Layer 1: Normal layer with downloading phase
+        {"status": "Pulling fs layer", "progressDetail": {}, "id": "layer1"},
+        {
+            "status": "Downloading",
+            "progressDetail": {"current": 100, "total": 1000},
+            "progress": "[=====>                                             ]     100B/1000B",
+            "id": "layer1",
+        },
+        {
+            "status": "Downloading",
+            "progressDetail": {"current": 1000, "total": 1000},
+            "progress": "[==================================================>]    1000B/1000B",
+            "id": "layer1",
+        },
+        {"status": "Download complete", "progressDetail": {}, "id": "layer1"},
+        {
+            "status": "Extracting",
+            "progressDetail": {"current": 1000, "total": 1000},
+            "progress": "[==================================================>]    1000B/1000B",
+            "id": "layer1",
+        },
+        {"status": "Pull complete", "progressDetail": {}, "id": "layer1"},
+        # Layer 2: Small layer that skips downloading (like 02a6e69d8d00 from the logs)
+        {"status": "Pulling fs layer", "progressDetail": {}, "id": "layer2"},
+        {"status": "Waiting", "progressDetail": {}, "id": "layer2"},
+        # Goes straight to Download complete without Downloading phase
+        {"status": "Download complete", "progressDetail": {}, "id": "layer2"},
+        {
+            "status": "Extracting",
+            "progressDetail": {"current": 96, "total": 96},
+            "progress": "[==================================================>]      96B/96B",
+            "id": "layer2",
+        },
+        {"status": "Pull complete", "progressDetail": {}, "id": "layer2"},
+        {"status": "Digest: sha256:test"},
+        {"status": "Status: Downloaded newer image for test/image:latest"},
+    ]
+
+    with patch.object(
+        type(coresys.supervisor), "arch", PropertyMock(return_value="amd64")
+    ):
+        # Schedule job so we can observe that it completes successfully
+        event = asyncio.Event()
+        job, install_task = coresys.jobs.schedule_job(
+            test_docker_interface.install,
+            JobSchedulerOptions(),
+            AwesomeVersion("1.2.3"),
+            "test",
+        )
+
+        async def listen_for_job_end(reference: SupervisorJob):
+            if reference.uuid != job.uuid:
+                return
+            event.set()
+
+        coresys.bus.register_event(BusEvent.SUPERVISOR_JOB_END, listen_for_job_end)
+        await install_task
+        await event.wait()
+
+    # The key assertion: Job should complete successfully without errors
+    # Without the fix, layer2 would block all progress reporting until it reached Extracting,
+    # preventing the aggregate progress calculation from running
+    assert job.done is True
+    assert job.progress == 100
+    capture_exception.assert_not_called()
+
+
+async def test_install_progress_handles_containerd_snapshotter(
+    coresys: CoreSys,
+    test_docker_interface: DockerInterface,
+    capture_exception: Mock,
+):
+    """Test install handles containerd snapshotter time-based progress (total=None)."""
+    coresys.core.set_state(CoreState.RUNNING)
+    # Containerd snapshotter reports extraction progress as time elapsed (e.g., "7 s")
+    # with current=7, total=None instead of byte-based progress
+    coresys.docker.docker.api.pull.return_value = [
+        {"status": "Pulling from test/image", "id": "latest"},
+        {"status": "Pulling fs layer", "progressDetail": {}, "id": "layer1"},
+        {
+            "status": "Downloading",
+            "progressDetail": {"current": 100, "total": 1000},
+            "progress": "[=====>                                             ]     100B/1000B",
+            "id": "layer1",
+        },
+        {
+            "status": "Downloading",
+            "progressDetail": {"current": 1000, "total": 1000},
+            "progress": "[==================================================>]    1000B/1000B",
+            "id": "layer1",
+        },
+        {"status": "Download complete", "progressDetail": {}, "id": "layer1"},
+        {
+            "status": "Extracting",
+            "progressDetail": {"current": 1000, "total": 1000},
+            "progress": "[==================================================>]    1000B/1000B",
+            "id": "layer1",
+        },
+        {"status": "Pull complete", "progressDetail": {}, "id": "layer1"},
+        # Layer 2: Containerd snapshotter with time-based extraction
+        {"status": "Pulling fs layer", "progressDetail": {}, "id": "layer2"},
+        {
+            "status": "Downloading",
+            "progressDetail": {"current": 50, "total": 500},
+            "progress": "[=====>                                             ]      50B/500B",
+            "id": "layer2",
+        },
+        {
+            "status": "Downloading",
+            "progressDetail": {"current": 500, "total": 500},
+            "progress": "[==================================================>]     500B/500B",
+            "id": "layer2",
+        },
+        {"status": "Download complete", "progressDetail": {}, "id": "layer2"},
+        # Time-based extraction progress (containerd snapshotter)
+        {
+            "status": "Extracting",
+            "progressDetail": {"current": 3, "total": None},
+            "progress": "3 s",
+            "id": "layer2",
+        },
+        {
+            "status": "Extracting",
+            "progressDetail": {"current": 7, "total": None},
+            "progress": "7 s",
+            "id": "layer2",
+        },
+        {"status": "Pull complete", "progressDetail": {}, "id": "layer2"},
+        {"status": "Digest: sha256:test"},
+        {"status": "Status: Downloaded newer image for test/image:latest"},
+    ]
+
+    with patch.object(
+        type(coresys.supervisor), "arch", PropertyMock(return_value="amd64")
+    ):
+        event = asyncio.Event()
+        job, install_task = coresys.jobs.schedule_job(
+            test_docker_interface.install,
+            JobSchedulerOptions(),
+            AwesomeVersion("1.2.3"),
+            "test",
+        )
+
+        async def listen_for_job_end(reference: SupervisorJob):
+            if reference.uuid != job.uuid:
+                return
+            event.set()
+
+        coresys.bus.register_event(BusEvent.SUPERVISOR_JOB_END, listen_for_job_end)
+        await install_task
+        await event.wait()
+
+    # The key assertion: Job should complete without crashing on None total
+    assert job.done is True
+    assert job.progress == 100
+    capture_exception.assert_not_called()
--- a/tests/jobs/test_job_manager.py
+++ b/tests/jobs/test_job_manager.py
@@ -198,7 +198,7 @@ async def test_notify_on_change(coresys: CoreSys, ha_ws_client: AsyncMock):
                        "errors": [
                            {
                                "type": "HassioError",
-                                "message": "Unknown error, see supervisor logs",
+                                "message": "Unknown error, see Supervisor logs (check with 'ha supervisor logs')",
                                "stage": "test",
                            }
                        ],
@@ -226,7 +226,7 @@ async def test_notify_on_change(coresys: CoreSys, ha_ws_client: AsyncMock):
                    "errors": [
                        {
                            "type": "HassioError",
-                            "message": "Unknown error, see supervisor logs",
+                            "message": "Unknown error, see Supervisor logs (check with 'ha supervisor logs')",
                            "stage": "test",
                        }
                    ],
Author	SHA1	Message	Date
Stefan Agner	230f359bf7	Support containerd snapshotter progress reporting	2025-11-05 15:09:17 +01:00
Stefan Agner	f996e60784	Fix docker image pull progress blocked by small layers Small Docker layers (typically <100 bytes) can skip the downloading phase entirely, going directly from "Pulling fs layer" to "Download complete" without emitting any progress events with byte counts. This caused the aggregate progress calculation to block indefinitely, as it required all layer jobs to have their `extra` field populated with byte counts before proceeding. The issue manifested as parent job progress jumping from 0% to 97.9% after long delays, as seen when a 96-byte layer held up progress reporting for ~50 seconds until it finally reached the "Extracting" phase. Set a minimal `extra` field (current=1, total=1) when layers reach "Download complete" without having gone through the downloading phase. This allows the aggregate progress calculation to proceed immediately while still correctly representing the layer as 100% downloaded. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-11-05 14:40:16 +01:00
dependabot[bot]	c9ceb4a4e3	Bump getsentry/action-release from 3.3.0 to 3.4.0 (#6284 ) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-11-05 12:17:51 +01:00
Ashton	d33305379f	Improve error message clarity by specifying to check supervisor logs (#6250 ) * Improve error message clarity by specifying to check supervisor logs with 'ha supervisor logs' * Fix ruff, supervisor -> Supervisor --------- Co-authored-by: Jan Čermák <sairon@sairon.cz>	2025-11-04 17:12:15 -05:00