Clamp progress to 100 to prevent floating point precision issues

Floating point arithmetic in weighted progress calculations can produce values slightly above 100 (e.g., 100.00000000000001). This causes validation errors when the progress value is checked. Add min(100, ...) clamping to both size-weighted and count-based progress calculations to ensure the result never exceeds 100. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add registry manifest fetcher for size-based pull progress
2025-12-03 06:28:23 +00:00 · 2025-12-02 21:18:36 +01:00 · 2025-12-02 21:18:34 +01:00 · 2025-12-02 21:17:03 +01:00 · 2025-12-02 21:17:03 +01:00 · 2025-12-02 21:16:57 +01:00
19 changed files with 1119 additions and 77 deletions
--- a/.dockerignore
+++ b/.dockerignore
@@ -1,6 +1,7 @@
 # General files
 .git
 .github
+.gitkeep
 .devcontainer
 .vscode

--- a/.github/workflows/builder.yml
+++ b/.github/workflows/builder.yml
@@ -72,19 +72,89 @@ jobs:

      - name: Get changed files
        id: changed_files
-        if: steps.version.outputs.publish == 'false'
+        if: github.event_name != 'release'
        uses: masesgroup/retrieve-changed-files@491e80760c0e28d36ca6240a27b1ccb8e1402c13 # v3.0.0

      - name: Check if requirements files changed
        id: requirements
        run: |
-          if [[ "${{ steps.changed_files.outputs.all }}" =~ (requirements.txt|build.yaml) ]]; then
+          # No wheels build necessary for releases
+          if [[ "${{ github.event_name }}" == "release" ]]; then
+            echo "changed=false" >> "$GITHUB_OUTPUT"
+          elif [[ "${{ steps.changed_files.outputs.all }}" =~ (requirements\.txt|build\.yaml|\.github/workflows/builder\.yml) ]]; then
            echo "changed=true" >> "$GITHUB_OUTPUT"
          fi

+  build_wheels:
+    name: Build wheels for ${{ matrix.arch }}
+    needs: init
+    if: needs.init.outputs.requirements == 'true'
+    runs-on: ${{ matrix.runs-on }}
+    strategy:
+      matrix:
+        arch: ${{ fromJson(needs.init.outputs.architectures) }}
+        include:
+          - runs-on: ubuntu-24.04
+          - arch: aarch64
+            runs-on: ubuntu-24.04-arm
+
+    env:
+      ABI: cp313
+      TAG: musllinux_1_2
+      APK_DEPS: "libffi-dev;openssl-dev;yaml-dev"
+      SKIP_BINARY: aiohttp
+    steps:
+      - name: Checkout the repository
+        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0
+
+      - name: Write env-file
+        run: |
+          (
+            # Fix out of memory issues with rust
+            echo "CARGO_NET_GIT_FETCH_WITH_CLI=true"
+          ) > .env_file
+
+      - name: Build and publish wheels
+        if: needs.init.outputs.publish == 'true'
+        uses: home-assistant/wheels@e5742a69d69f0e274e2689c998900c7d19652c21 # 2025.12.0
+        with:
+          wheels-key: ${{ secrets.WHEELS_KEY }}
+          abi: ${{ env.ABI }}
+          tag: ${{ env.TAG }}
+          arch: ${{ matrix.arch }}
+          apk: ${{ env.APK_DEPS }}
+          skip-binary: ${{ env.SKIP_BINARY }}
+          env-file: true
+          requirements: "requirements.txt"
+
+      - name: Build local wheels
+        uses: home-assistant/wheels@e5742a69d69f0e274e2689c998900c7d19652c21 # 2025.12.0
+        if: needs.init.outputs.publish == 'false'
+        with:
+          wheels-host: ""
+          wheels-user: ""
+          wheels-key: ""
+          local-wheels-repo-path: "wheels/"
+          abi: ${{ env.ABI }}
+          tag: ${{ env.TAG }}
+          arch: ${{ matrix.arch }}
+          apk: ${{ env.APK_DEPS }}
+          skip-binary: ${{ env.SKIP_BINARY }}
+          env-file: true
+          requirements: "requirements.txt"
+
+      - name: Upload local wheels artifact
+        if: needs.init.outputs.publish == 'false'
+        uses: actions/upload-artifact@330a01c490aca151604b8cf639adc76d48f6c5d4 # v5.0.0
+        with:
+          name: wheels-${{ matrix.arch }}
+          path: wheels
+          retention-days: 1
+
  build:
    name: Build ${{ matrix.arch }} supervisor
-    needs: init
+    needs: [init, build_wheels]
+    if: ${{ !cancelled() && !failure() }}
    runs-on: ubuntu-latest
    permissions:
      contents: read
@@ -99,27 +169,12 @@ jobs:
        with:
          fetch-depth: 0

-      - name: Write env-file
-        if: needs.init.outputs.requirements == 'true'
-        run: |
-          (
-            # Fix out of memory issues with rust
-            echo "CARGO_NET_GIT_FETCH_WITH_CLI=true"
-          ) > .env_file
-
-      # home-assistant/wheels doesn't support sha pinning
-      - name: Build wheels
-        if: needs.init.outputs.requirements == 'true'
-        uses: home-assistant/wheels@2025.11.0
+      - name: Download local wheels artifact
+        if: needs.init.outputs.requirements == 'true' && needs.init.outputs.publish == 'false'
+        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
        with:
-          abi: cp313
-          tag: musllinux_1_2
-          arch: ${{ matrix.arch }}
-          wheels-key: ${{ secrets.WHEELS_KEY }}
-          apk: "libffi-dev;openssl-dev;yaml-dev"
-          skip-binary: aiohttp
-          env-file: true
-          requirements: "requirements.txt"
+          name: wheels-${{ matrix.arch }}
+          path: wheels

      - name: Set version
        if: needs.init.outputs.publish == 'true'
@@ -208,6 +263,13 @@ jobs:
      - name: Checkout the repository
        uses: actions/checkout@1af3b93b6815bc44a9784bd300feb67ff0d1eeb3 # v6.0.0

+      - name: Download local wheels artifact
+        if: needs.init.outputs.requirements == 'true' && needs.init.outputs.publish == 'false'
+        uses: actions/download-artifact@018cc2cf5baa6db3ef3c5f8a56943fffe632ef53 # v6.0.0
+        with:
+          name: wheels-amd64
+          path: wheels
+
      # home-assistant/builder doesn't support sha pinning
      - name: Build the Supervisor
        if: needs.init.outputs.publish != 'true'
--- a/.gitignore
+++ b/.gitignore
@@ -24,6 +24,9 @@ var/
 .installed.cfg
 *.egg

+# Local wheels
+wheels/**/*.whl
+
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
@@ -102,4 +105,4 @@ ENV/
 /.dmypy.json

 # Mac
-.DS_Store
+.DS_Store
--- a/12
+++ b/12
@@ -32,7 +32,17 @@ RUN \
 # Install requirements
 RUN \
    --mount=type=bind,source=./requirements.txt,target=/usr/src/requirements.txt \
-    uv pip install --compile-bytecode --no-cache --no-build -r requirements.txt
+    --mount=type=bind,source=./wheels,target=/usr/src/wheels \
+    if ls /usr/src/wheels/musllinux/* >/dev/null 2>&1; then \
+        LOCAL_WHEELS=/usr/src/wheels/musllinux; \
+        echo "Using local wheels from: $LOCAL_WHEELS"; \
+    else \
+        LOCAL_WHEELS=; \
+        echo "No local wheels found"; \
+    fi && \
+    uv pip install --compile-bytecode --no-cache --no-build \
+        -r requirements.txt \
+        ${LOCAL_WHEELS:+--find-links $LOCAL_WHEELS}

 # Install Home Assistant Supervisor
 COPY . supervisor
--- a/requirements_tests.txt
+++ b/requirements_tests.txt
@@ -10,7 +10,7 @@ pytest-timeout==2.4.0
 pytest==9.0.1
 ruff==0.14.7
 time-machine==3.1.0
-types-docker==7.1.0.20251129
+types-docker==7.1.0.20251202
 types-pyyaml==6.0.12.20250915
 types-requests==2.32.4.20250913
 urllib3==2.5.0
--- a/supervisor/addons/build.py
+++ b/supervisor/addons/build.py
@@ -23,7 +23,7 @@ from ..const import (
    CpuArch,
 )
 from ..coresys import CoreSys, CoreSysAttributes
-from ..docker.const import DOCKER_HUB
+from ..docker.const import DOCKER_HUB, DOCKER_HUB_LEGACY
 from ..docker.interface import MAP_ARCH
 from ..exceptions import ConfigurationFileError, HassioArchNotFound
 from ..utils.common import FileConfiguration, find_one_filetype
@@ -155,8 +155,11 @@ class AddonBuild(FileConfiguration, CoreSysAttributes):

        # Use the actual registry URL for the key
        # Docker Hub uses "https://index.docker.io/v1/" as the key
+        # Support both docker.io (official) and hub.docker.com (legacy)
        registry_key = (
-            "https://index.docker.io/v1/" if registry == DOCKER_HUB else registry
+            "https://index.docker.io/v1/"
+            if registry in (DOCKER_HUB, DOCKER_HUB_LEGACY)
+            else registry
        )

        config = {"auths": {registry_key: {"auth": auth_string}}}
--- a/supervisor/docker/const.py
+++ b/supervisor/docker/const.py
@@ -10,11 +10,13 @@ from docker.types import Mount

 from ..const import MACHINE_ID

+RE_RETRYING_DOWNLOAD_STATUS = re.compile(r"Retrying in \d+ seconds?")
+
 # Docker Hub registry identifier
 DOCKER_HUB = "hub.docker.com"

-# Regex to match images with a registry host (e.g., ghcr.io/org/image)
-IMAGE_WITH_HOST = re.compile(r"^((?:[a-z0-9]+(?:-[a-z0-9]+)*\.)+[a-z]{2,})\/.+")
+# Legacy Docker Hub identifier for backward compatibility
+DOCKER_HUB_LEGACY = "hub.docker.com"


 class Capabilities(StrEnum):
--- a/supervisor/docker/interface.py
+++ b/supervisor/docker/interface.py
@@ -42,7 +42,7 @@ from ..jobs.decorator import Job
 from ..jobs.job_group import JobGroup
 from ..resolution.const import ContextType, IssueType, SuggestionType
 from ..utils.sentry import async_capture_exception
-from .const import DOCKER_HUB, ContainerState, RestartPolicy
+from .const import DOCKER_HUB, DOCKER_HUB_LEGACY, ContainerState, RestartPolicy
 from .manager import CommandReturn, PullLogEntry
 from .monitor import DockerContainerStateEvent
 from .pull_progress import ImagePullProgress
@@ -182,7 +182,8 @@ class DockerInterface(JobGroup, ABC):
            stored = self.sys_docker.config.registries[registry]
            credentials[ATTR_USERNAME] = stored[ATTR_USERNAME]
            credentials[ATTR_PASSWORD] = stored[ATTR_PASSWORD]
-            if registry != DOCKER_HUB:
+            # Don't include registry for Docker Hub (both official and legacy)
+            if registry not in (DOCKER_HUB, DOCKER_HUB_LEGACY):
                credentials[ATTR_REGISTRY] = registry

            _LOGGER.debug(
@@ -212,9 +213,26 @@ class DockerInterface(JobGroup, ABC):
            raise ValueError("Cannot pull without an image!")

        image_arch = arch or self.sys_arch.supervisor
+        platform = MAP_ARCH[image_arch]
        pull_progress = ImagePullProgress()
        current_job = self.sys_jobs.current

+        # Try to fetch manifest for accurate size-based progress
+        # This is optional - if it fails, we fall back to count-based progress
+        try:
+            manifest = await self.sys_docker.manifest_fetcher.get_manifest(
+                image, str(version), platform=platform
+            )
+            if manifest:
+                pull_progress.set_manifest(manifest)
+                _LOGGER.debug(
+                    "Using manifest for progress: %d layers, %d bytes",
+                    manifest.layer_count,
+                    manifest.total_size,
+                )
+        except Exception as err:  # noqa: BLE001
+            _LOGGER.debug("Could not fetch manifest for progress: %s", err)
+
        async def process_pull_event(event: PullLogEntry) -> None:
            """Process pull event and update job progress."""
            if event.job_id != current_job.uuid:
@@ -243,7 +261,7 @@ class DockerInterface(JobGroup, ABC):
                current_job.uuid,
                image,
                str(version),
-                platform=MAP_ARCH[image_arch],
+                platform=platform,
                auth=credentials,
            )

--- a/supervisor/docker/manager.py
+++ b/supervisor/docker/manager.py
@@ -49,9 +49,11 @@ from ..exceptions import (
 )
 from ..utils.common import FileConfiguration
 from ..validate import SCHEMA_DOCKER_CONFIG
-from .const import DOCKER_HUB, IMAGE_WITH_HOST, LABEL_MANAGED
+from .const import DOCKER_HUB, DOCKER_HUB_LEGACY, LABEL_MANAGED
+from .manifest import RegistryManifestFetcher
 from .monitor import DockerMonitor
 from .network import DockerNetwork
+from .utils import get_registry_from_image

 _LOGGER: logging.Logger = logging.getLogger(__name__)

@@ -212,19 +214,25 @@ class DockerConfig(FileConfiguration):

        Matches the image against configured registries and returns the registry
        name if found, or None if no matching credentials are configured.
+
+        Uses Docker's domain detection logic from:
+        vendor/github.com/distribution/reference/normalize.go
        """
        if not self.registries:
            return None

        # Check if image uses a custom registry (e.g., ghcr.io/org/image)
-        matcher = IMAGE_WITH_HOST.match(image)
-        if matcher:
-            registry = matcher.group(1)
+        registry = get_registry_from_image(image)
+        if registry:
            if registry in self.registries:
                return registry
-        # If no registry prefix, check for Docker Hub credentials
-        elif DOCKER_HUB in self.registries:
-            return DOCKER_HUB
+        else:
+            # No registry prefix means Docker Hub
+            # Support both docker.io (official) and hub.docker.com (legacy)
+            if DOCKER_HUB in self.registries:
+                return DOCKER_HUB
+            if DOCKER_HUB_LEGACY in self.registries:
+                return DOCKER_HUB_LEGACY

        return None

@@ -251,6 +259,9 @@ class DockerAPI(CoreSysAttributes):
        self._info: DockerInfo | None = None
        self.config: DockerConfig = DockerConfig()
        self._monitor: DockerMonitor = DockerMonitor(coresys)
+        self._manifest_fetcher: RegistryManifestFetcher = RegistryManifestFetcher(
+            coresys
+        )

    async def post_init(self) -> Self:
        """Post init actions that must be done in event loop."""
@@ -316,6 +327,11 @@ class DockerAPI(CoreSysAttributes):
        """Return docker events monitor."""
        return self._monitor

+    @property
+    def manifest_fetcher(self) -> RegistryManifestFetcher:
+        """Return manifest fetcher for registry access."""
+        return self._manifest_fetcher
+
    async def load(self) -> None:
        """Start docker events monitor."""
        await self.monitor.load()
@@ -761,7 +777,7 @@ class DockerAPI(CoreSysAttributes):
        """Import a tar file as image."""
        try:
            with tar_file.open("rb") as read_tar:
-                resp: list[dict[str, Any]] = self.images.import_image(read_tar)
+                resp: list[dict[str, Any]] = await self.images.import_image(read_tar)
        except (aiodocker.DockerError, OSError) as err:
            raise DockerError(
                f"Can't import image from tar: {err}", _LOGGER.error
--- a/supervisor/docker/manifest.py
+++ b/supervisor/docker/manifest.py
@@ -0,0 +1,354 @@
+"""Docker registry manifest fetcher.
+
+Fetches image manifests directly from container registries to get layer sizes
+before pulling an image. This enables accurate size-based progress tracking.
+"""
+
+from __future__ import annotations
+
+from dataclasses import dataclass
+import logging
+import re
+from typing import TYPE_CHECKING
+
+import aiohttp
+
+from .const import DOCKER_HUB, IMAGE_WITH_HOST
+
+if TYPE_CHECKING:
+    from ..coresys import CoreSys
+
+_LOGGER = logging.getLogger(__name__)
+
+# Default registry for images without explicit host
+DEFAULT_REGISTRY = "registry-1.docker.io"
+
+# Media types for manifest requests
+MANIFEST_MEDIA_TYPES = (
+    "application/vnd.docker.distribution.manifest.v2+json",
+    "application/vnd.oci.image.manifest.v1+json",
+    "application/vnd.docker.distribution.manifest.list.v2+json",
+    "application/vnd.oci.image.index.v1+json",
+)
+
+
+@dataclass
+class ImageManifest:
+    """Container image manifest with layer information."""
+
+    digest: str
+    total_size: int
+    layers: dict[str, int]  # digest -> size in bytes
+
+    @property
+    def layer_count(self) -> int:
+        """Return number of layers."""
+        return len(self.layers)
+
+
+def parse_image_reference(image: str, tag: str) -> tuple[str, str, str]:
+    """Parse image reference into (registry, repository, tag).
+
+    Examples:
+        ghcr.io/home-assistant/home-assistant:2025.1.0
+            -> (ghcr.io, home-assistant/home-assistant, 2025.1.0)
+        homeassistant/home-assistant:latest
+            -> (registry-1.docker.io, homeassistant/home-assistant, latest)
+        alpine:3.18
+            -> (registry-1.docker.io, library/alpine, 3.18)
+
+    """
+    # Check if image has explicit registry host
+    match = IMAGE_WITH_HOST.match(image)
+    if match:
+        registry = match.group(1)
+        repository = image[len(registry) + 1 :]  # Remove "registry/" prefix
+    else:
+        registry = DEFAULT_REGISTRY
+        repository = image
+        # Docker Hub requires "library/" prefix for official images
+        if "/" not in repository:
+            repository = f"library/{repository}"
+
+    return registry, repository, tag
+
+
+class RegistryManifestFetcher:
+    """Fetches manifests from container registries."""
+
+    def __init__(self, coresys: CoreSys) -> None:
+        """Initialize the fetcher."""
+        self.coresys = coresys
+        self._session: aiohttp.ClientSession | None = None
+
+    async def _get_session(self) -> aiohttp.ClientSession:
+        """Get or create aiohttp session."""
+        if self._session is None or self._session.closed:
+            self._session = aiohttp.ClientSession()
+        return self._session
+
+    async def close(self) -> None:
+        """Close the session."""
+        if self._session and not self._session.closed:
+            await self._session.close()
+            self._session = None
+
+    def _get_credentials(self, registry: str) -> tuple[str, str] | None:
+        """Get credentials for registry from Docker config.
+
+        Returns (username, password) tuple or None if no credentials.
+        """
+        registries = self.coresys.docker.config.registries
+
+        # Map registry hostname to config key
+        # Docker Hub can be stored as "hub.docker.com" in config
+        if registry == DEFAULT_REGISTRY:
+            if DOCKER_HUB in registries:
+                creds = registries[DOCKER_HUB]
+                return creds.get("username"), creds.get("password")
+        elif registry in registries:
+            creds = registries[registry]
+            return creds.get("username"), creds.get("password")
+
+        return None
+
+    async def _get_auth_token(
+        self,
+        session: aiohttp.ClientSession,
+        registry: str,
+        repository: str,
+    ) -> str | None:
+        """Get authentication token for registry.
+
+        Uses the WWW-Authenticate header from a 401 response to discover
+        the token endpoint, then requests a token with appropriate scope.
+        """
+        # First, make an unauthenticated request to get WWW-Authenticate header
+        manifest_url = f"https://{registry}/v2/{repository}/manifests/latest"
+
+        try:
+            async with session.get(manifest_url) as resp:
+                if resp.status == 200:
+                    # No auth required
+                    return None
+
+                if resp.status != 401:
+                    _LOGGER.warning(
+                        "Unexpected status %d from registry %s", resp.status, registry
+                    )
+                    return None
+
+                www_auth = resp.headers.get("WWW-Authenticate", "")
+        except aiohttp.ClientError as err:
+            _LOGGER.warning("Failed to connect to registry %s: %s", registry, err)
+            return None
+
+        # Parse WWW-Authenticate: Bearer realm="...",service="...",scope="..."
+        if not www_auth.startswith("Bearer "):
+            _LOGGER.warning("Unsupported auth type from %s: %s", registry, www_auth)
+            return None
+
+        params = {}
+        for match in re.finditer(r'(\w+)="([^"]*)"', www_auth):
+            params[match.group(1)] = match.group(2)
+
+        realm = params.get("realm")
+        service = params.get("service")
+
+        if not realm:
+            _LOGGER.warning("No realm in WWW-Authenticate from %s", registry)
+            return None
+
+        # Build token request URL
+        token_url = f"{realm}?scope=repository:{repository}:pull"
+        if service:
+            token_url += f"&service={service}"
+
+        # Check for credentials
+        auth = None
+        credentials = self._get_credentials(registry)
+        if credentials:
+            username, password = credentials
+            if username and password:
+                auth = aiohttp.BasicAuth(username, password)
+                _LOGGER.debug("Using credentials for %s", registry)
+
+        try:
+            async with session.get(token_url, auth=auth) as resp:
+                if resp.status != 200:
+                    _LOGGER.warning(
+                        "Failed to get token from %s: %d", realm, resp.status
+                    )
+                    return None
+
+                data = await resp.json()
+                return data.get("token") or data.get("access_token")
+        except aiohttp.ClientError as err:
+            _LOGGER.warning("Failed to get auth token: %s", err)
+            return None
+
+    async def _fetch_manifest(
+        self,
+        session: aiohttp.ClientSession,
+        registry: str,
+        repository: str,
+        reference: str,
+        token: str | None,
+        platform: str | None = None,
+    ) -> dict | None:
+        """Fetch manifest from registry.
+
+        If the manifest is a manifest list (multi-arch), fetches the
+        platform-specific manifest.
+        """
+        manifest_url = f"https://{registry}/v2/{repository}/manifests/{reference}"
+
+        headers = {"Accept": ", ".join(MANIFEST_MEDIA_TYPES)}
+        if token:
+            headers["Authorization"] = f"Bearer {token}"
+
+        try:
+            async with session.get(manifest_url, headers=headers) as resp:
+                if resp.status != 200:
+                    _LOGGER.warning(
+                        "Failed to fetch manifest for %s/%s:%s - %d",
+                        registry,
+                        repository,
+                        reference,
+                        resp.status,
+                    )
+                    return None
+
+                manifest = await resp.json()
+        except aiohttp.ClientError as err:
+            _LOGGER.warning("Failed to fetch manifest: %s", err)
+            return None
+
+        media_type = manifest.get("mediaType", "")
+
+        # Check if this is a manifest list (multi-arch image)
+        if "list" in media_type or "index" in media_type:
+            manifests = manifest.get("manifests", [])
+            if not manifests:
+                _LOGGER.warning("Empty manifest list for %s/%s", registry, repository)
+                return None
+
+            # Find matching platform
+            target_os = "linux"
+            target_arch = "amd64"  # Default
+
+            if platform:
+                # Platform format is "linux/amd64", "linux/arm64", etc.
+                parts = platform.split("/")
+                if len(parts) >= 2:
+                    target_os, target_arch = parts[0], parts[1]
+
+            platform_manifest = None
+            for m in manifests:
+                plat = m.get("platform", {})
+                if (
+                    plat.get("os") == target_os
+                    and plat.get("architecture") == target_arch
+                ):
+                    platform_manifest = m
+                    break
+
+            if not platform_manifest:
+                # Fall back to first manifest
+                _LOGGER.debug(
+                    "Platform %s/%s not found, using first manifest",
+                    target_os,
+                    target_arch,
+                )
+                platform_manifest = manifests[0]
+
+            # Fetch the platform-specific manifest
+            return await self._fetch_manifest(
+                session,
+                registry,
+                repository,
+                platform_manifest["digest"],
+                token,
+                platform,
+            )
+
+        return manifest
+
+    async def get_manifest(
+        self,
+        image: str,
+        tag: str,
+        platform: str | None = None,
+    ) -> ImageManifest | None:
+        """Fetch manifest and extract layer sizes.
+
+        Args:
+            image: Image name (e.g., "ghcr.io/home-assistant/home-assistant")
+            tag: Image tag (e.g., "2025.1.0")
+            platform: Target platform (e.g., "linux/amd64")
+
+        Returns:
+            ImageManifest with layer sizes, or None if fetch failed.
+
+        """
+        registry, repository, tag = parse_image_reference(image, tag)
+
+        _LOGGER.debug(
+            "Fetching manifest for %s/%s:%s (platform=%s)",
+            registry,
+            repository,
+            tag,
+            platform,
+        )
+
+        session = await self._get_session()
+
+        # Get auth token
+        token = await self._get_auth_token(session, registry, repository)
+
+        # Fetch manifest
+        manifest = await self._fetch_manifest(
+            session, registry, repository, tag, token, platform
+        )
+
+        if not manifest:
+            return None
+
+        # Extract layer information
+        layers = manifest.get("layers", [])
+        if not layers:
+            _LOGGER.warning(
+                "No layers in manifest for %s/%s:%s", registry, repository, tag
+            )
+            return None
+
+        layer_sizes: dict[str, int] = {}
+        total_size = 0
+
+        for layer in layers:
+            digest = layer.get("digest", "")
+            size = layer.get("size", 0)
+            if digest and size:
+                # Store by short digest (first 12 chars after sha256:)
+                short_digest = (
+                    digest.split(":")[1][:12] if ":" in digest else digest[:12]
+                )
+                layer_sizes[short_digest] = size
+                total_size += size
+
+        digest = manifest.get("config", {}).get("digest", "")
+
+        _LOGGER.debug(
+            "Manifest for %s/%s:%s - %d layers, %d bytes total",
+            registry,
+            repository,
+            tag,
+            len(layer_sizes),
+            total_size,
+        )
+
+        return ImageManifest(
+            digest=digest,
+            total_size=total_size,
+            layers=layer_sizes,
+        )
--- a/supervisor/docker/pull_progress.py
+++ b/supervisor/docker/pull_progress.py
@@ -10,6 +10,7 @@ from typing import TYPE_CHECKING, cast

 if TYPE_CHECKING:
    from .manager import PullLogEntry
+    from .manifest import ImageManifest

 _LOGGER = logging.getLogger(__name__)

@@ -109,23 +110,43 @@ class LayerProgress:
 class ImagePullProgress:
    """Track overall progress of pulling an image.

-    Uses count-based progress where each layer contributes equally regardless of size.
-    This avoids progress regression when large layers are discovered late due to
-    Docker's rate-limiting of concurrent downloads.
+    When manifest layer sizes are provided, uses size-weighted progress where
+    each layer contributes proportionally to its size. This gives accurate
+    progress based on actual bytes to download.

-    Progress is only reported after the first "Downloading" event, since Docker
-    sends "Already exists" and "Pulling fs layer" events before we know the full
-    layer count.
+    When manifest is not available, falls back to count-based progress where
+    each layer contributes equally.
+
+    Layers that already exist locally are excluded from the progress calculation.
    """

    layers: dict[str, LayerProgress] = field(default_factory=dict)
    _last_reported_progress: float = field(default=0.0, repr=False)
    _seen_downloading: bool = field(default=False, repr=False)
+    _manifest_layer_sizes: dict[str, int] = field(default_factory=dict, repr=False)
+    _total_manifest_size: int = field(default=0, repr=False)
+
+    def set_manifest(self, manifest: ImageManifest) -> None:
+        """Set manifest layer sizes for accurate size-based progress.
+
+        Should be called before processing pull events.
+        """
+        self._manifest_layer_sizes = dict(manifest.layers)
+        self._total_manifest_size = manifest.total_size
+        _LOGGER.debug(
+            "Manifest set: %d layers, %d bytes total",
+            len(self._manifest_layer_sizes),
+            self._total_manifest_size,
+        )

    def get_or_create_layer(self, layer_id: str) -> LayerProgress:
        """Get existing layer or create new one."""
        if layer_id not in self.layers:
-            self.layers[layer_id] = LayerProgress(layer_id=layer_id)
+            # If we have manifest sizes, pre-populate the layer's total_size
+            manifest_size = self._manifest_layer_sizes.get(layer_id, 0)
+            self.layers[layer_id] = LayerProgress(
+                layer_id=layer_id, total_size=manifest_size
+            )
        return self.layers[layer_id]

    def process_event(self, entry: PullLogEntry) -> None:
@@ -237,8 +258,13 @@ class ImagePullProgress:
    def calculate_progress(self) -> float:
        """Calculate overall progress 0-100.

-        Uses count-based progress where each layer that needs pulling contributes
-        equally. Layers that already exist locally are excluded from the calculation.
+        When manifest layer sizes are available, uses size-weighted progress
+        where each layer contributes proportionally to its size.
+
+        When manifest is not available, falls back to count-based progress
+        where each layer contributes equally.
+
+        Layers that already exist locally are excluded from the calculation.

        Returns 0 until we've seen the first "Downloading" event, since Docker
        reports "Already exists" and "Pulling fs layer" events before we know
@@ -258,9 +284,38 @@ class ImagePullProgress:
            # All layers already exist, nothing to download
            return 100.0

-        # Each layer contributes equally: sum of layer progresses / total layers
+        # Use size-weighted progress if manifest sizes are available
+        if self._manifest_layer_sizes:
+            return min(100, self._calculate_size_weighted_progress(layers_to_pull))
+
+        # Fall back to count-based progress
        total_progress = sum(layer.calculate_progress() for layer in layers_to_pull)
-        return total_progress / len(layers_to_pull)
+        return min(100, total_progress / len(layers_to_pull))
+
+    def _calculate_size_weighted_progress(
+        self, layers_to_pull: list[LayerProgress]
+    ) -> float:
+        """Calculate size-weighted progress.
+
+        Each layer contributes to progress proportionally to its size.
+        Progress = sum(layer_progress * layer_size) / total_size
+        """
+        # Calculate total size of layers that need pulling
+        total_size = sum(layer.total_size for layer in layers_to_pull)
+
+        if total_size == 0:
+            # No size info available, fall back to count-based
+            total_progress = sum(layer.calculate_progress() for layer in layers_to_pull)
+            return total_progress / len(layers_to_pull)
+
+        # Weight each layer's progress by its size
+        weighted_progress = 0.0
+        for layer in layers_to_pull:
+            if layer.total_size > 0:
+                layer_weight = layer.total_size / total_size
+                weighted_progress += layer.calculate_progress() * layer_weight
+
+        return weighted_progress

    def get_stage(self) -> str | None:
        """Get current stage based on layer states."""
--- a/supervisor/docker/utils.py
+++ b/supervisor/docker/utils.py
@@ -0,0 +1,57 @@
+"""Docker utilities."""
+
+from __future__ import annotations
+
+import re
+
+# Docker image reference domain regex
+# Based on Docker's reference implementation:
+# vendor/github.com/distribution/reference/normalize.go
+#
+# A domain is detected if the part before the first / contains:
+# - "localhost" (with optional port)
+# - Contains "." (like registry.example.com or 127.0.0.1)
+# - Contains ":" (like myregistry:5000)
+# - IPv6 addresses in brackets (like [::1]:5000)
+#
+# Note: Docker also treats uppercase letters as registry indicators since
+# namespaces must be lowercase, but this regex handles lowercase matching
+# and the get_registry_from_image() function validates the registry rules.
+IMAGE_REGISTRY_REGEX = re.compile(
+    r"^(?P<registry>"
+    r"localhost(?::[0-9]+)?|"  # localhost with optional port
+    r"(?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9])"  # domain component
+    r"(?:\.(?:[a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9-]*[a-zA-Z0-9]))*"  # more components
+    r"(?::[0-9]+)?|"  # optional port
+    r"\[[a-fA-F0-9:]+\](?::[0-9]+)?"  # IPv6 with optional port
+    r")/"  # must be followed by /
+)
+
+
+def get_registry_from_image(image_ref: str) -> str | None:
+    """Extract registry from Docker image reference.
+
+    Returns the registry if the image reference contains one,
+    or None if the image uses Docker Hub (docker.io).
+
+    Based on Docker's reference implementation:
+    vendor/github.com/distribution/reference/normalize.go
+
+    Examples:
+        get_registry_from_image("nginx")                        -> None (docker.io)
+        get_registry_from_image("library/nginx")                -> None (docker.io)
+        get_registry_from_image("myregistry.com/nginx")         -> "myregistry.com"
+        get_registry_from_image("localhost/myimage")            -> "localhost"
+        get_registry_from_image("localhost:5000/myimage")       -> "localhost:5000"
+        get_registry_from_image("registry.io:5000/org/app:v1")  -> "registry.io:5000"
+        get_registry_from_image("[::1]:5000/myimage")           -> "[::1]:5000"
+
+    """
+    match = IMAGE_REGISTRY_REGEX.match(image_ref)
+    if match:
+        registry = match.group("registry")
+        # Must contain '.' or ':' or be 'localhost' to be a real registry
+        # This prevents treating "myuser/myimage" as having registry "myuser"
+        if "." in registry or ":" in registry or registry == "localhost":
+            return registry
+    return None  # No registry = Docker Hub (docker.io)
--- a/supervisor/store/git.py
+++ b/supervisor/store/git.py
@@ -183,19 +183,22 @@ class GitRepo(CoreSysAttributes):
                raise StoreGitError() from err

            try:
-                branch = self.repo.active_branch.name
+                repo = self.repo

-                # Download data
-                await self.sys_run_in_executor(
-                    ft.partial(
-                        self.repo.remotes.origin.fetch,
-                        **{"update-shallow": True, "depth": 1},  # type: ignore
+                def _fetch_and_check() -> tuple[str, bool]:
+                    """Fetch from origin and check if changed."""
+                    # This property access is I/O bound
+                    branch = repo.active_branch.name
+                    repo.remotes.origin.fetch(
+                        **{"update-shallow": True, "depth": 1}  # type: ignore[arg-type]
                    )
-                )
+                    changed = repo.commit(branch) != repo.commit(f"origin/{branch}")
+                    return branch, changed

-                if changed := self.repo.commit(branch) != self.repo.commit(
-                    f"origin/{branch}"
-                ):
+                # Download data and check for changes
+                branch, changed = await self.sys_run_in_executor(_fetch_and_check)
+
+                if changed:
                    # Jump on top of that
                    await self.sys_run_in_executor(
                        ft.partial(self.repo.git.reset, f"origin/{branch}", hard=True)
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -144,9 +144,9 @@ async def docker() -> DockerAPI:

        docker_images.inspect.return_value = image_inspect
        docker_images.list.return_value = [image_inspect]
-        docker_images.import_image.return_value = [
-            {"stream": "Loaded image: test:latest\n"}
-        ]
+        docker_images.import_image = AsyncMock(
+            return_value=[{"stream": "Loaded image: test:latest\n"}]
+        )

        docker_images.pull.return_value = AsyncIterator([{}])

@@ -154,6 +154,9 @@ async def docker() -> DockerAPI:
        docker_obj.info.storage = "overlay2"
        docker_obj.info.version = AwesomeVersion("1.0.0")

+        # Mock manifest fetcher to return None (falls back to count-based progress)
+        docker_obj._manifest_fetcher.get_manifest = AsyncMock(return_value=None)
+
        yield docker_obj


--- a/tests/docker/test_credentials.py
+++ b/tests/docker/test_credentials.py
@@ -1,9 +1,49 @@
 """Test docker login."""

+import pytest
+
 # pylint: disable=protected-access
 from supervisor.coresys import CoreSys
-from supervisor.docker.const import DOCKER_HUB
+from supervisor.docker.const import DOCKER_HUB, DOCKER_HUB_LEGACY
 from supervisor.docker.interface import DockerInterface
+from supervisor.docker.utils import get_registry_from_image
+
+
+@pytest.mark.parametrize(
+    ("image_ref", "expected_registry"),
+    [
+        # No registry - Docker Hub images
+        ("nginx", None),
+        ("nginx:latest", None),
+        ("library/nginx", None),
+        ("library/nginx:latest", None),
+        ("homeassistant/amd64-supervisor", None),
+        ("homeassistant/amd64-supervisor:1.2.3", None),
+        # Registry with dot
+        ("ghcr.io/homeassistant/amd64-supervisor", "ghcr.io"),
+        ("ghcr.io/homeassistant/amd64-supervisor:latest", "ghcr.io"),
+        ("myregistry.com/nginx", "myregistry.com"),
+        ("registry.example.com/org/image:v1", "registry.example.com"),
+        ("127.0.0.1/myimage", "127.0.0.1"),
+        # Registry with port
+        ("myregistry:5000/myimage", "myregistry:5000"),
+        ("localhost:5000/myimage", "localhost:5000"),
+        ("registry.io:5000/org/app:v1", "registry.io:5000"),
+        # localhost special case
+        ("localhost/myimage", "localhost"),
+        ("localhost/myimage:tag", "localhost"),
+        # IPv6
+        ("[::1]:5000/myimage", "[::1]:5000"),
+        ("[2001:db8::1]:5000/myimage:tag", "[2001:db8::1]:5000"),
+    ],
+)
+def test_get_registry_from_image(image_ref: str, expected_registry: str | None):
+    """Test get_registry_from_image extracts registry from image reference.
+
+    Based on Docker's reference implementation:
+    vendor/github.com/distribution/reference/normalize.go
+    """
+    assert get_registry_from_image(image_ref) == expected_registry


 def test_no_credentials(coresys: CoreSys, test_docker_interface: DockerInterface):
@@ -47,3 +87,36 @@ def test_matching_credentials(coresys: CoreSys, test_docker_interface: DockerInt
    )
    assert credentials["username"] == "Spongebob Squarepants"
    assert "registry" not in credentials
+
+
+def test_legacy_docker_hub_credentials(
+    coresys: CoreSys, test_docker_interface: DockerInterface
+):
+    """Test legacy hub.docker.com credentials are used for Docker Hub images."""
+    coresys.docker.config._data["registries"] = {
+        DOCKER_HUB_LEGACY: {"username": "LegacyUser", "password": "Password1!"},
+    }
+
+    credentials = test_docker_interface._get_credentials(
+        "homeassistant/amd64-supervisor"
+    )
+    assert credentials["username"] == "LegacyUser"
+    # No registry should be included for Docker Hub
+    assert "registry" not in credentials
+
+
+def test_docker_hub_preferred_over_legacy(
+    coresys: CoreSys, test_docker_interface: DockerInterface
+):
+    """Test docker.io is preferred over legacy hub.docker.com when both exist."""
+    coresys.docker.config._data["registries"] = {
+        DOCKER_HUB: {"username": "NewUser", "password": "Password1!"},
+        DOCKER_HUB_LEGACY: {"username": "LegacyUser", "password": "Password2!"},
+    }
+
+    credentials = test_docker_interface._get_credentials(
+        "homeassistant/amd64-supervisor"
+    )
+    # docker.io should be preferred
+    assert credentials["username"] == "NewUser"
+    assert "registry" not in credentials
--- a/tests/docker/test_manager.py
+++ b/tests/docker/test_manager.py
@@ -2,7 +2,7 @@

 import asyncio
 from pathlib import Path
-from unittest.mock import MagicMock, patch
+from unittest.mock import AsyncMock, MagicMock, patch

 from docker.errors import APIError, DockerException, NotFound
 import pytest
@@ -412,9 +412,9 @@ async def test_repair_failures(coresys: CoreSys, caplog: pytest.LogCaptureFixtur
 async def test_import_image(coresys: CoreSys, tmp_path: Path, log_starter: str):
    """Test importing an image into docker."""
    (test_tar := tmp_path / "test.tar").touch()
-    coresys.docker.images.import_image.return_value = [
-        {"stream": f"{log_starter}: imported"}
-    ]
+    coresys.docker.images.import_image = AsyncMock(
+        return_value=[{"stream": f"{log_starter}: imported"}]
+    )
    coresys.docker.images.inspect.return_value = {"Id": "imported"}

    image = await coresys.docker.import_image(test_tar)
@@ -426,9 +426,9 @@ async def test_import_image(coresys: CoreSys, tmp_path: Path, log_starter: str):
 async def test_import_image_error(coresys: CoreSys, tmp_path: Path):
    """Test failure importing an image into docker."""
    (test_tar := tmp_path / "test.tar").touch()
-    coresys.docker.images.import_image.return_value = [
-        {"errorDetail": {"message": "fail"}}
-    ]
+    coresys.docker.images.import_image = AsyncMock(
+        return_value=[{"errorDetail": {"message": "fail"}}]
+    )

    with pytest.raises(DockerError, match="Can't import image from tar: fail"):
        await coresys.docker.import_image(test_tar)
@@ -441,10 +441,12 @@ async def test_import_multiple_images_in_tar(
 ):
    """Test importing an image into docker."""
    (test_tar := tmp_path / "test.tar").touch()
-    coresys.docker.images.import_image.return_value = [
-        {"stream": "Loaded image: imported-1"},
-        {"stream": "Loaded image: imported-2"},
-    ]
+    coresys.docker.images.import_image = AsyncMock(
+        return_value=[
+            {"stream": "Loaded image: imported-1"},
+            {"stream": "Loaded image: imported-2"},
+        ]
+    )

    assert await coresys.docker.import_image(test_tar) is None

--- a/tests/docker/test_manifest.py
+++ b/tests/docker/test_manifest.py
@@ -0,0 +1,164 @@
+"""Tests for registry manifest fetcher."""
+
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from supervisor.docker.manifest import (
+    DEFAULT_REGISTRY,
+    ImageManifest,
+    RegistryManifestFetcher,
+    parse_image_reference,
+)
+
+
+class TestParseImageReference:
+    """Tests for parse_image_reference function."""
+
+    def test_ghcr_io_image(self):
+        """Test parsing ghcr.io image."""
+        registry, repo, tag = parse_image_reference(
+            "ghcr.io/home-assistant/home-assistant", "2025.1.0"
+        )
+        assert registry == "ghcr.io"
+        assert repo == "home-assistant/home-assistant"
+        assert tag == "2025.1.0"
+
+    def test_docker_hub_with_org(self):
+        """Test parsing Docker Hub image with organization."""
+        registry, repo, tag = parse_image_reference(
+            "homeassistant/home-assistant", "latest"
+        )
+        assert registry == DEFAULT_REGISTRY
+        assert repo == "homeassistant/home-assistant"
+        assert tag == "latest"
+
+    def test_docker_hub_official_image(self):
+        """Test parsing Docker Hub official image (no org)."""
+        registry, repo, tag = parse_image_reference("alpine", "3.18")
+        assert registry == DEFAULT_REGISTRY
+        assert repo == "library/alpine"
+        assert tag == "3.18"
+
+    def test_gcr_io_image(self):
+        """Test parsing gcr.io image."""
+        registry, repo, tag = parse_image_reference("gcr.io/project/image", "v1")
+        assert registry == "gcr.io"
+        assert repo == "project/image"
+        assert tag == "v1"
+
+
+class TestImageManifest:
+    """Tests for ImageManifest dataclass."""
+
+    def test_layer_count(self):
+        """Test layer_count property."""
+        manifest = ImageManifest(
+            digest="sha256:abc",
+            total_size=1000,
+            layers={"layer1": 500, "layer2": 500},
+        )
+        assert manifest.layer_count == 2
+
+
+class TestRegistryManifestFetcher:
+    """Tests for RegistryManifestFetcher class."""
+
+    @pytest.fixture
+    def mock_coresys(self):
+        """Create mock coresys."""
+        coresys = MagicMock()
+        coresys.docker.config.registries = {}
+        return coresys
+
+    @pytest.fixture
+    def fetcher(self, mock_coresys):
+        """Create fetcher instance."""
+        return RegistryManifestFetcher(mock_coresys)
+
+    async def test_get_manifest_success(self, fetcher):
+        """Test successful manifest fetch by mocking internal methods."""
+        manifest_data = {
+            "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
+            "config": {"digest": "sha256:abc123"},
+            "layers": [
+                {"digest": "sha256:layer1abc123def456789012", "size": 1000},
+                {"digest": "sha256:layer2def456abc789012345", "size": 2000},
+            ],
+        }
+
+        # Mock the internal methods instead of the session
+        with (
+            patch.object(
+                fetcher, "_get_auth_token", new=AsyncMock(return_value="test-token")
+            ),
+            patch.object(
+                fetcher, "_fetch_manifest", new=AsyncMock(return_value=manifest_data)
+            ),
+            patch.object(fetcher, "_get_session", new=AsyncMock()),
+        ):
+            result = await fetcher.get_manifest(
+                "test.io/org/image", "v1.0", platform="linux/amd64"
+            )
+
+            assert result is not None
+            assert result.total_size == 3000
+            assert result.layer_count == 2
+            # First 12 chars after sha256:
+            assert "layer1abc123" in result.layers
+            assert result.layers["layer1abc123"] == 1000
+
+    async def test_get_manifest_returns_none_on_failure(self, fetcher):
+        """Test that get_manifest returns None on failure."""
+        with (
+            patch.object(
+                fetcher, "_get_auth_token", new=AsyncMock(return_value="test-token")
+            ),
+            patch.object(fetcher, "_fetch_manifest", new=AsyncMock(return_value=None)),
+            patch.object(fetcher, "_get_session", new=AsyncMock()),
+        ):
+            result = await fetcher.get_manifest(
+                "test.io/org/image", "v1.0", platform="linux/amd64"
+            )
+
+            assert result is None
+
+    async def test_close_session(self, fetcher):
+        """Test session cleanup."""
+        mock_session = AsyncMock()
+        mock_session.closed = False
+        mock_session.close = AsyncMock()
+        fetcher._session = mock_session
+
+        await fetcher.close()
+
+        mock_session.close.assert_called_once()
+        assert fetcher._session is None
+
+    def test_get_credentials_docker_hub(self, mock_coresys, fetcher):
+        """Test getting Docker Hub credentials."""
+        mock_coresys.docker.config.registries = {
+            "hub.docker.com": {"username": "user", "password": "pass"}
+        }
+
+        creds = fetcher._get_credentials(DEFAULT_REGISTRY)
+
+        assert creds == ("user", "pass")
+
+    def test_get_credentials_custom_registry(self, mock_coresys, fetcher):
+        """Test getting credentials for custom registry."""
+        mock_coresys.docker.config.registries = {
+            "ghcr.io": {"username": "user", "password": "token"}
+        }
+
+        creds = fetcher._get_credentials("ghcr.io")
+
+        assert creds == ("user", "token")
+
+    def test_get_credentials_not_found(self, mock_coresys, fetcher):
+        """Test no credentials found."""
+        mock_coresys.docker.config.registries = {}
+
+        creds = fetcher._get_credentials("unknown.io")
+
+        assert creds is None
--- a/tests/docker/test_pull_progress.py
+++ b/tests/docker/test_pull_progress.py
@@ -3,6 +3,7 @@
 import pytest

 from supervisor.docker.manager import PullLogEntry, PullProgressDetail
+from supervisor.docker.manifest import ImageManifest
 from supervisor.docker.pull_progress import (
    DOWNLOAD_WEIGHT,
    EXTRACT_WEIGHT,
@@ -784,3 +785,218 @@ class TestImagePullProgress:
            )

        assert progress.calculate_progress() == 100.0
+
+    def test_size_weighted_progress_with_manifest(self):
+        """Test size-weighted progress when manifest layer sizes are known."""
+        # Create manifest with known layer sizes
+        # Small layer: 1KB, Large layer: 100KB
+        manifest = ImageManifest(
+            digest="sha256:test",
+            total_size=101000,
+            layers={
+                "small123456": 1000,  # 1KB - ~1% of total
+                "large123456": 100000,  # 100KB - ~99% of total
+            },
+        )
+
+        progress = ImagePullProgress()
+        progress.set_manifest(manifest)
+
+        # Layer events - small layer first
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="small123456",
+                status="Pulling fs layer",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="large123456",
+                status="Pulling fs layer",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+
+        # Small layer downloads completely
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="small123456",
+                status="Downloading",
+                progress_detail=PullProgressDetail(current=1000, total=1000),
+            )
+        )
+
+        # Size-weighted: small layer is ~1% of total size
+        # Small layer at 70% (download done) = contributes ~0.7% to overall
+        assert progress.calculate_progress() == pytest.approx(0.69, rel=0.1)
+
+        # Large layer starts downloading (1% of its size)
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="large123456",
+                status="Downloading",
+                progress_detail=PullProgressDetail(current=1000, total=100000),
+            )
+        )
+
+        # Large layer at 1% download = contributes ~0.7% (1% * 70% * 99% weight)
+        # Total: ~0.7% + ~0.7% = ~1.4%
+        current = progress.calculate_progress()
+        assert current > 0.7  # More than just small layer
+        assert current < 5.0  # But not much more
+
+        # Complete both layers
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="small123456",
+                status="Pull complete",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="large123456",
+                status="Pull complete",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+
+        assert progress.calculate_progress() == 100.0
+
+    def test_size_weighted_excludes_already_exists(self):
+        """Test that already existing layers are excluded from size-weighted progress."""
+        # Manifest has 3 layers, but one will already exist locally
+        manifest = ImageManifest(
+            digest="sha256:test",
+            total_size=200000,
+            layers={
+                "cached12345": 100000,  # Will be cached - shouldn't count
+                "layer1_1234": 50000,  # Needs pulling
+                "layer2_1234": 50000,  # Needs pulling
+            },
+        )
+
+        progress = ImagePullProgress()
+        progress.set_manifest(manifest)
+
+        # Cached layer already exists
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="cached12345",
+                status="Already exists",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+
+        # Other layers need pulling
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="layer1_1234",
+                status="Pulling fs layer",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="layer2_1234",
+                status="Pulling fs layer",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+
+        # Start downloading layer1 (50% of its size)
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="layer1_1234",
+                status="Downloading",
+                progress_detail=PullProgressDetail(current=25000, total=50000),
+            )
+        )
+
+        # layer1 is 50% of total that needs pulling (50KB out of 100KB)
+        # At 50% download = 35% layer progress (70% * 50%)
+        # Size-weighted: 50% * 35% = 17.5%
+        assert progress.calculate_progress() == pytest.approx(17.5)
+
+        # Complete layer1
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="layer1_1234",
+                status="Pull complete",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+
+        # layer1 at 100%, layer2 at 0%
+        # Size-weighted: 50% * 100% + 50% * 0% = 50%
+        assert progress.calculate_progress() == pytest.approx(50.0)
+
+    def test_fallback_to_count_based_without_manifest(self):
+        """Test that without manifest, count-based progress is used."""
+        progress = ImagePullProgress()
+
+        # No manifest set - should use count-based progress
+
+        # Two layers of different sizes
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="small",
+                status="Pulling fs layer",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="large",
+                status="Pulling fs layer",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+
+        # Small layer (1KB) completes
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="small",
+                status="Downloading",
+                progress_detail=PullProgressDetail(current=1000, total=1000),
+            )
+        )
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="small",
+                status="Pull complete",
+                progress_detail=PullProgressDetail(),
+            )
+        )
+
+        # Large layer (100MB) at 1%
+        progress.process_event(
+            PullLogEntry(
+                job_id="test",
+                id="large",
+                status="Downloading",
+                progress_detail=PullProgressDetail(current=1000000, total=100000000),
+            )
+        )
+
+        # Count-based: each layer is 50% weight
+        # small: 100% * 50% = 50%
+        # large: 0.7% (1% * 70%) * 50% = 0.35%
+        # Total: ~50.35%
+        assert progress.calculate_progress() == pytest.approx(50.35, rel=0.01)
--- a/wheels/.gitkeep
+++ b/wheels/.gitkeep
Author	SHA1	Message	Date
Stefan Agner	f2596feb17	Clamp progress to 100 to prevent floating point precision issues Floating point arithmetic in weighted progress calculations can produce values slightly above 100 (e.g., 100.00000000000001). This causes validation errors when the progress value is checked. Add min(100, ...) clamping to both size-weighted and count-based progress calculations to ensure the result never exceeds 100. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 21:18:36 +01:00
Stefan Agner	dc46b5b45b	Add registry manifest fetcher for size-based pull progress Fetch image manifests directly from container registries before pulling to get accurate layer sizes upfront. This enables size-weighted progress tracking where each layer contributes proportionally to its byte size, rather than equal weight per layer. Key changes: - Add RegistryManifestFetcher that handles auth discovery via WWW-Authenticate headers, token fetching with optional credentials, and multi-arch manifest list resolution - Update ImagePullProgress to accept manifest layer sizes via set_manifest() and calculate size-weighted progress - Fall back to count-based progress when manifest fetch fails - Pre-populate layer sizes from manifest when creating layer trackers The manifest fetcher supports ghcr.io, Docker Hub, and private registries by using credentials from Docker config when available. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 21:18:34 +01:00
Stefan Agner	204206640f	Exclude already-existing layers from pull progress calculation Layers that already exist locally should not count towards download progress since there's nothing to download for them. Only layers that need pulling are included in the progress calculation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 21:17:03 +01:00
Stefan Agner	8c3ccfd83d	Fix pytest	2025-12-02 21:17:03 +01:00
Stefan Agner	2615199172	Use count-based progress for Docker image pulls Refactor Docker image pull progress to use a simpler count-based approach where each layer contributes equally (100% / total_layers) regardless of size. This replaces the previous size-weighted calculation that was susceptible to progress regression. The core issue was that Docker rate-limits concurrent downloads (~3 at a time) and reports layer sizes only when downloading starts. With size- weighted progress, large layers appearing late would cause progress to drop dramatically (e.g., 59% -> 29%) as the total size increased. The new approach: - Each layer contributes equally to overall progress - Per-layer progress: 70% download weight, 30% extraction weight - Progress only starts after first "Downloading" event (when layer count is known) - Always caps at 99% - job completion handles final 100% This simplifies the code by moving progress tracking to a dedicated module (pull_progress.py) and removing complex size-based scaling logic that tried to account for unknown layer sizes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>	2025-12-02 21:16:57 +01:00
Stefan Agner	20f993e891	Avoid getting changed files for releases (#6381 ) The changed files GitHub Action is not available for release events, so we skip that step and directly set the output to false for releases. This restores how releases worked before #6374.	2025-12-02 20:23:37 +01:00
Stefan Agner	d220fa801f	Await aiodocker import_image coroutine (#6378 ) The aiodocker images.import_image() method returns a coroutine that needs to be awaited, but the code was iterating over it directly, causing "TypeError: 'coroutine' object is not iterable". Fixes SUPERVISOR-13D9 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>	2025-12-02 14:11:06 -05:00
Stefan Agner	abeee95eb1	Fix blocking I/O in git repository pull operation (#6380 ) Co-authored-by: Claude <noreply@anthropic.com>	2025-12-02 19:03:28 +01:00
Stefan Agner	50d31202ae	Use Docker's official registry domain detection logic (#6360 ) * Use Docker's official registry domain detection logic Replace the custom IMAGE_WITH_HOST regex with a proper implementation based on Docker's reference parser (vendor/github.com/distribution/ reference/normalize.go). Changes: - Change DOCKER_HUB from "hub.docker.com" to "docker.io" (official default) - Add DOCKER_HUB_LEGACY for backward compatibility with "hub.docker.com" - Add IMAGE_DOMAIN_REGEX and get_domain() function that properly detects: - localhost (with optional port) - Domains with "." (e.g., ghcr.io, 127.0.0.1) - Domains with ":" port (e.g., myregistry:5000) - IPv6 addresses (e.g., [::1]:5000) - Update credential handling to support both docker.io and hub.docker.com - Add comprehensive tests for domain detection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Refactor Docker domain detection to utils module Move get_domain function to supervisor/docker/utils.py and rename it to get_domain_from_image for consistency with get_registry_for_image. Use named group in the regex for better readability. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * Rename domain to registry for consistency Use consistent "registry" terminology throughout the codebase: - Rename get_domain_from_image to get_registry_from_image - Rename IMAGE_DOMAIN_REGEX to IMAGE_REGISTRY_REGEX - Update named group from "domain" to "registry" - Update all related comments and variable names 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2025-12-02 14:30:03 +01:00
Jan Čermák	bac072a985	Use unpublished local wheels during PR builds (#6374 ) * Use unpublished local wheels during PR builds Refactor wheel building to use the new `local-wheels-repo-path` and move wheels building into a separate CI job. Wheels are only published on published (i.e. release or merged dev), for PR builds they are passed as artifacts to the build job instead. * Address review comments * Add trailing slash for wheels folder * Always run the changed_files check to ensure build_wheels runs on publish * Use full path for workflow and escape dots in changed files regexp	2025-12-02 14:08:07 +01:00
dependabot[bot]	2fc6a7dcab	Bump types-docker from 7.1.0.20251129 to 7.1.0.20251202 (#6376 )	2025-12-02 07:36:51 +01:00