|
| 1 | +# Developer Guide: Using the docker-tools Infrastructure |
| 2 | + |
| 3 | +This guide walks you through the practical scenarios and workflows for using the docker-tools infrastructure. The `eng/docker-tools` directory is a **shared infrastructure layer** used across all .NET Docker repositories (dotnet-docker, dotnet-buildtools-prereqs-docker, dotnet-framework-docker). It solves a fundamental challenge: building, testing, and publishing Docker images across multiple operating systems (Alpine, Ubuntu, Azure Linux, Windows Server variants), multiple CPU architectures (amd64, arm64, arm32), and multiple .NET versions—all while maintaining consistency and reliability. |
| 4 | + |
| 5 | +At its core, the infrastructure provides: |
| 6 | + |
| 7 | +- **PowerShell scripts** for local image building and Docker operations—so you can test Dockerfile changes on your machine before committing |
| 8 | +- **Azure Pipelines templates** for CI/CD (build, test, publish)—a composable template system that orchestrates builds across dozens of OS/architecture combinations in parallel |
| 9 | +- **ImageBuilder orchestration**—a specialized .NET tool that understands manifest files, manages image dependencies, handles multi-arch manifest creation, and coordinates the entire build process |
| 10 | +- **Caching and optimization**—intelligent systems that skip unchanged images and minimize redundant work |
| 11 | +- **SBOM generation**—automatic Software Bill of Materials creation for supply chain security |
| 12 | + |
| 13 | +The infrastructure handles complexity that would otherwise be overwhelming: a single commit to a repo can trigger builds of hundreds of image variants across Linux and Windows agents, each requiring proper build sequencing, testing, and eventual publication to Microsoft Artifact Registry (MAR). |
| 14 | + |
| 15 | +**Important:** Files in `eng/docker-tools/` are synchronized across repositories by automation in the [dotnet/docker-tools](https://github.com/dotnet/docker-tools) repository. If you need to make changes to this infrastructure, submit them there—changes made directly in consuming repos will be overwritten. |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | +## Local Development Scenarios |
| 20 | + |
| 21 | +### Scenario: Building Docker Images Locally |
| 22 | + |
| 23 | +The most common local task is building images to test Dockerfile changes before pushing. |
| 24 | + |
| 25 | +**Quick Build - All Images:** |
| 26 | +```powershell |
| 27 | +./eng/docker-tools/build.ps1 |
| 28 | +``` |
| 29 | + |
| 30 | +**Filter by OS:** |
| 31 | +```powershell |
| 32 | +# Build only Alpine images |
| 33 | +./eng/docker-tools/build.ps1 -OS "alpine" |
| 34 | +
|
| 35 | +# Build Ubuntu 24.04 images |
| 36 | +./eng/docker-tools/build.ps1 -OS "noble" |
| 37 | +``` |
| 38 | + |
| 39 | +**Filter by Architecture:** |
| 40 | +```powershell |
| 41 | +# Build arm64 images only |
| 42 | +./eng/docker-tools/build.ps1 -Architecture "arm64" |
| 43 | +``` |
| 44 | + |
| 45 | +**Filter by Path:** |
| 46 | +```powershell |
| 47 | +# Build images from a specific directory |
| 48 | +./eng/docker-tools/build.ps1 -Paths "src/runtime/8.0/alpine3.20" |
| 49 | +
|
| 50 | +# Build all 8.0 runtime images using glob pattern |
| 51 | +./eng/docker-tools/build.ps1 -Paths "*runtime*8.0*" |
| 52 | +``` |
| 53 | + |
| 54 | +**Combine Filters:** |
| 55 | +```powershell |
| 56 | +# Build .NET 8.0 Alpine arm64 images |
| 57 | +./eng/docker-tools/build.ps1 -Version "8.0" -OS "alpine" -Architecture "arm64" |
| 58 | +``` |
| 59 | + |
| 60 | +**Filter by Product Version (if applicable):** |
| 61 | +```powershell |
| 62 | +# Build only .NET 8.0 images |
| 63 | +./eng/docker-tools/build.ps1 -Version "8.0" |
| 64 | +
|
| 65 | +# Build .NET 6.0 and 8.0 images |
| 66 | +./eng/docker-tools/build.ps1 -Version "6.0","8.0" |
| 67 | +``` |
| 68 | + |
| 69 | +### Understanding What Happens Under the Hood |
| 70 | + |
| 71 | +When you run [`build.ps1`](build.ps1), here's the chain of execution: |
| 72 | + |
| 73 | +``` |
| 74 | +build.ps1 |
| 75 | + │ |
| 76 | + ├── Translates your filter parameters into ImageBuilder CLI args |
| 77 | + │ |
| 78 | + └── Calls Invoke-ImageBuilder.ps1 "build --version X --os-version Y ..." |
| 79 | + │ |
| 80 | + ├── On Linux: Runs ImageBuilder in a Docker container |
| 81 | + │ └── Builds image: microsoft-dotnet-imagebuilder-withrepo |
| 82 | + │ └── Mounts Docker socket and repo contents |
| 83 | + │ |
| 84 | + └── On Windows: Extracts ImageBuilder locally (due to Docker-in-Docker limitations) |
| 85 | + └── Runs Microsoft.DotNet.ImageBuilder.exe directly |
| 86 | +``` |
| 87 | + |
| 88 | +### Scenario: Running ImageBuilder Directly |
| 89 | + |
| 90 | +For advanced scenarios, you may want to invoke ImageBuilder with specific commands: |
| 91 | + |
| 92 | +```powershell |
| 93 | +# Run any ImageBuilder command |
| 94 | +./eng/docker-tools/Invoke-ImageBuilder.ps1 "build --help" |
| 95 | +
|
| 96 | +# Generate the build matrix (useful for debugging pipeline behavior) |
| 97 | +./eng/docker-tools/Invoke-ImageBuilder.ps1 "generateBuildMatrix --manifest manifest.json --type platformDependencyGraph" |
| 98 | +
|
| 99 | +# Validate manifest syntax |
| 100 | +./eng/docker-tools/Invoke-ImageBuilder.ps1 "validateManifest --manifest manifest.json" |
| 101 | +``` |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +## Understanding the Pipeline Architecture |
| 106 | + |
| 107 | +### The Build Flow |
| 108 | + |
| 109 | +The pipeline behaves differently depending on the build context: |
| 110 | + |
| 111 | +**Public PR Builds**: |
| 112 | +``` |
| 113 | +Build Stage |
| 114 | + ├── PreBuildValidation |
| 115 | + ├── GenerateBuildMatrix |
| 116 | + └── Build Jobs (dry-run, no push) |
| 117 | + └── Inline tests after each build |
| 118 | + │ |
| 119 | + ▼ |
| 120 | + Post_Build Stage |
| 121 | + └── Merge artifacts |
| 122 | + │ |
| 123 | + ▼ |
| 124 | + Publish Stage (dry-run) |
| 125 | + └── All publish operations run but skip actual pushes |
| 126 | + │ |
| 127 | + ▼ |
| 128 | + (end) |
| 129 | +``` |
| 130 | +- Images are built but **not pushed** to any registry |
| 131 | +- Tests run inline within each build job |
| 132 | +- Publish stage runs in dry-run mode (validates publish logic without pushing) |
| 133 | +- Validates that Dockerfiles build successfully |
| 134 | + |
| 135 | +**Internal Official Builds**: |
| 136 | +``` |
| 137 | +Build Stage |
| 138 | + ├── PreBuildValidation |
| 139 | + ├── CopyBaseImages → staging ACR |
| 140 | + ├── GenerateBuildMatrix |
| 141 | + └── Build Jobs (push to staging ACR) |
| 142 | + │ |
| 143 | + ▼ |
| 144 | + Post_Build Stage |
| 145 | + ├── Merge image info files |
| 146 | + └── Consolidate SBOMs |
| 147 | + │ |
| 148 | + ▼ |
| 149 | + Test Stage |
| 150 | + ├── GenerateTestMatrix |
| 151 | + └── Test Jobs |
| 152 | + │ |
| 153 | + ▼ |
| 154 | + Publish Stage |
| 155 | + ├── Copy images to production ACR |
| 156 | + ├── Create multi-arch manifests |
| 157 | + ├── Wait for MAR ingestion |
| 158 | + ├── Update READMEs |
| 159 | + ├── Publish image info to versions repo |
| 160 | + └── Apply EOL annotations |
| 161 | +``` |
| 162 | +- Full pipeline with all stages |
| 163 | +- Images flow: `buildAcr` → `publishAcr` → MAR (see [`publish-config-prod.yml`](templates/stages/dotnet/publish-config-prod.yml) for ACR definitions) |
| 164 | +- Tests run against staged images |
| 165 | +- Only successful builds get published |
| 166 | + |
| 167 | +### Build Matrix Generation |
| 168 | + |
| 169 | +The `generateBuildMatrix` command is key to understanding how builds are parallelized. It: |
| 170 | + |
| 171 | +1. **Reads the manifest.json** - Understands which images exist |
| 172 | +2. **Builds a dependency graph** - Knows that `runtime-deps` must build before `runtime` |
| 173 | +3. **Groups by platform** - Creates jobs for each OS/Architecture combo |
| 174 | +4. **Optimizes with caching** - Can detect and exclude unchanged images |
| 175 | + |
| 176 | +### Controlling Which Build Stages Run |
| 177 | + |
| 178 | +The `stages` variable is a comma-separated string that controls which pipeline stages execute: |
| 179 | + |
| 180 | +```yaml |
| 181 | +variables: |
| 182 | +- name: stages |
| 183 | + value: "build,test,publish" # Run all stages |
| 184 | +``` |
| 185 | +
|
| 186 | +Common patterns: |
| 187 | +- `"build"` - Build only, no tests or publishing |
| 188 | +- `"build,test"` - Build and test, but don't publish |
| 189 | +- `"publish"` - Publish only (when re-running a failed publish from a previous build) |
| 190 | +- `"build,test,publish"` - Full pipeline |
| 191 | +
|
| 192 | +**Note:** The `Post_Build` stage is implicitly included whenever `build` is in the stages list. You don't need to specify it separately—it automatically runs after Build to merge image info files and consolidate SBOMs. |
| 193 | + |
| 194 | +The stages variable is useful for: |
| 195 | +- Re-running just the publish stage after fixing a transient failure |
| 196 | +- Skipping tests during initial development |
| 197 | +- Running isolated stages for debugging |
| 198 | + |
| 199 | +### Image Info Files: The Build's Memory |
| 200 | + |
| 201 | +Image info files (defined by [`ImageArtifactDetails`](https://github.com/dotnet/docker-tools/blob/main/src/ImageBuilder/Models/Image/ImageArtifactDetails.cs)) are the mechanism that tracks what was built: |
| 202 | + |
| 203 | +```json |
| 204 | +{ |
| 205 | + "repos": [{ |
| 206 | + "repo": "dotnet/runtime", |
| 207 | + "images": [{ |
| 208 | + "platforms": [{ |
| 209 | + "dockerfile": "src/runtime/8.0/alpine3.20/amd64/Dockerfile", |
| 210 | + "digest": "sha256:abc123...", |
| 211 | + "created": "2024-01-15T10:30:00Z", |
| 212 | + "commitUrl": "https://github.com/dotnet/dotnet-docker/commit/..." |
| 213 | + }] |
| 214 | + }] |
| 215 | + }] |
| 216 | +} |
| 217 | +``` |
| 218 | + |
| 219 | +**How they flow through the pipeline:** |
| 220 | +1. **Build stage**: Each build job produces an image-info fragment |
| 221 | +2. **Post_Build stage**: Fragments are merged into a single `image-info.json` |
| 222 | +3. **Test stage**: Uses merged info to know which images to test |
| 223 | +4. **Publish stage**: Uses info to know which images to copy/publish |
| 224 | +5. **Versions repo**: Final info is committed to the versions repo |
| 225 | + |
| 226 | +The [versions repo](https://github.com/dotnet/versions) stores the "source of truth" image info. Future builds compare against this to determine what's changed and skip unchanged images. |
| 227 | + |
| 228 | +**Using Image Info for Investigations** |
| 229 | + |
| 230 | +Image info files are invaluable when you need to track down information about a specific image, particularly when starting from a digest reported by a customer or security scan. |
| 231 | + |
| 232 | +*Scenario: "Which commit produced this image?"* |
| 233 | + |
| 234 | +Given a digest like `sha256:abc123...`, you can trace it back to its source: |
| 235 | + |
| 236 | +1. **Check the versions repo history** - The `dotnet/versions` repo contains historical image info committed after each publish. Use `git log -p --all -S 'sha256:abc123'` to find the commit that introduced this digest. |
| 237 | + |
| 238 | +2. **From the image info entry**, you'll find: |
| 239 | + - `commitUrl` - The exact source commit that built this image |
| 240 | + - `dockerfile` - Which Dockerfile produced it |
| 241 | + - `created` - When it was built |
| 242 | + - `simpleTags` - The tags applied to this image |
| 243 | + |
| 244 | +*Scenario: "What was in the last successful build?"* |
| 245 | + |
| 246 | +Download the `image-info` artifact from a pipeline run in Azure DevOps: |
| 247 | +1. Navigate to the pipeline run |
| 248 | +2. Go to the "Published" artifacts section |
| 249 | +3. Download `image-info` (merged) or individual `*-image-info-*` fragments |
| 250 | + |
| 251 | +*Scenario: "When did we last publish updates to a specific image?"* |
| 252 | + |
| 253 | +Use the versions repo git history: |
| 254 | +```bash |
| 255 | +# In the dotnet/versions repo |
| 256 | +git log --oneline -- build-info/docker/image-info.dotnet-dotnet-docker-main.json |
| 257 | +``` |
| 258 | + |
| 259 | +Each commit corresponds to a publish operation and includes the full image info at that point in time. |
| 260 | + |
| 261 | +*Scenario: "Compare what changed between two publishes"* |
| 262 | + |
| 263 | +```bash |
| 264 | +git diff <commit1> <commit2> -- build-info/docker/image-info.dotnet-dotnet-docker-main.json |
| 265 | +``` |
| 266 | + |
| 267 | +This shows which images were added, removed, or rebuilt (new digests) between the two publishes. |
| 268 | + |
| 269 | +### The Publish Flow in Detail |
| 270 | + |
| 271 | +The publish stage does more than just push images. Here's the sequence: |
| 272 | + |
| 273 | +1. **Copy Images** — `copyAcrImages` copies from build ACR to publish ACR |
| 274 | +2. **Publish Manifest** — `publishManifest` creates multi-arch manifest lists |
| 275 | +3. **Wait for MAR Ingestion** — Polls MAR until images are available (timeout configurable) |
| 276 | +4. **Publish READMEs** — Updates documentation in the registry |
| 277 | +5. **Wait for Doc Ingestion** — Ensures README changes are live |
| 278 | +6. **Merge & Publish Image Info** — Updates the versions repo with new image metadata |
| 279 | +7. **Ingest Kusto Image Info** — Sends telemetry to Kusto for analytics |
| 280 | +8. **Generate & Apply EOL Annotations** — Marks images with end-of-life dates |
| 281 | +9. **Post Publish Notification** — Creates GitHub issues/notifications about the publish |
| 282 | + |
| 283 | +### Dry-Run Mode |
| 284 | + |
| 285 | +For testing pipeline changes without actually publishing: |
| 286 | + |
| 287 | +```yaml |
| 288 | +# In pipeline variables or at runtime |
| 289 | +variables: |
| 290 | +- name: dryRunArg |
| 291 | + value: "--dry-run" |
| 292 | +``` |
| 293 | + |
| 294 | +Or the infrastructure automatically enables dry-run for: |
| 295 | +- Pull request builds |
| 296 | +- Builds from non-official branches |
| 297 | +- Public project builds |
| 298 | + |
| 299 | +The [`set-dry-run.yml`](templates/steps/set-dry-run.yml) step template determines this automatically based on context. |
| 300 | + |
| 301 | +--- |
| 302 | + |
| 303 | +## Automatic Image Rebuilds |
| 304 | + |
| 305 | +The infrastructure includes automation that monitors for base image updates and triggers rebuilds when dependencies change. |
| 306 | + |
| 307 | +### How It Works |
| 308 | + |
| 309 | +A scheduled pipeline ([`check-base-image-updates.yml`](https://github.com/dotnet/docker-tools/blob/main/eng/pipelines/check-base-image-updates.yml)) runs every 4 hours and: |
| 310 | + |
| 311 | +1. **Checks for stale images** — Compares the base image digests used in our published images against the current digests in upstream registries |
| 312 | +2. **Identifies affected images** — Determines which of our images need rebuilding because their base image changed |
| 313 | +3. **Queues targeted builds** — Automatically triggers builds for only the affected images, not the entire repo |
| 314 | + |
| 315 | +This ensures that security patches and updates in base images (like `alpine`, `ubuntu`, `mcr.microsoft.com/windows/nanoserver`) flow through to images without manual intervention. |
| 316 | + |
| 317 | +### Failure Handling and Recovery |
| 318 | + |
| 319 | +The system has built-in retry logic but requires manual intervention after repeated failures: |
| 320 | + |
| 321 | +**Automatic retry behavior:** |
| 322 | +- If a triggered build fails, the system will attempt to rebuild every 4 hours |
| 323 | +- After **3 unsuccessful attempts**, the system stops queuing new builds for that image |
| 324 | +- This prevents endless rebuild loops when there's a genuine issue requiring human attention |
| 325 | + |
| 326 | +**After fixing the issue:** |
| 327 | + |
| 328 | +Once you've fixed the underlying problem (Dockerfile change, test fix, etc.) and have a successful build: |
| 329 | + |
| 330 | +1. Navigate to the successful pipeline run in Azure DevOps |
| 331 | +2. Add the `autobuilder` label to that run |
| 332 | +3. This signals to the infrastructure that a successful build has occurred |
| 333 | +4. The system will resume automatic rebuilds for that image as needed |
| 334 | + |
| 335 | +The `autobuilder` label is how the infrastructure tracks that the failure cycle has been broken and normal operations can resume. |
| 336 | + |
| 337 | +--- |
| 338 | + |
| 339 | +## Common Customization Patterns |
| 340 | + |
| 341 | +### Pattern: Adding Build Arguments |
| 342 | + |
| 343 | +Pass additional arguments to Docker builds via ImageBuilder: |
| 344 | + |
| 345 | +```yaml |
| 346 | +customBuildInitSteps: |
| 347 | +- powershell: | |
| 348 | + $args = "--build-arg MY_VAR=value" |
| 349 | + echo "##vso[task.setvariable variable=imageBuilderBuildArgs]$args" |
| 350 | +``` |
| 351 | + |
| 352 | +### Pattern: Re-running Stages with `stages` and `sourceBuildPipelineRunId` |
| 353 | + |
| 354 | +A powerful pattern is combining the `stages` variable with the `sourceBuildPipelineRunId` pipeline parameter to run specific stages using artifacts from a previous build. This is useful for: |
| 355 | +1. Skipping stages you don't need to run |
| 356 | +2. Avoiding unnecessary re-builds after test/publishing infrastructure fixes |
| 357 | + |
| 358 | +Note: For simple retries of failed jobs, use the Azure Pipelines UI "Re-run failed jobs" feature instead. |
| 359 | + |
| 360 | +**Scenario: Test failed, need to run publish anyway** |
| 361 | + |
| 362 | +* Set `sourceBuildPipelineRunId` to the build which built the images |
| 363 | +* Set `stages` to `publish` |
| 364 | + |
| 365 | +**How it works:** |
| 366 | + |
| 367 | +1. `sourceBuildPipelineRunId` tells the pipeline which previous run to pull artifacts from |
| 368 | +2. The [`download-build-artifact.yml`](templates/steps/download-build-artifact.yml) step uses this ID to fetch `image-info.json` from that run |
| 369 | +3. Specified stage(s) use the downloaded image info to know which images exist |
| 370 | + |
| 371 | +**Common recovery patterns:** |
| 372 | + |
| 373 | +| Scenario | stages | sourceBuildPipelineRunId | |
| 374 | +|----------|--------|--------------------------| |
| 375 | +| Normal full build | `"build,test,publish"` | `$(Build.BuildId)` (default) | |
| 376 | +| Re-run publish after infra fix | `"publish"` | ID of the successful build run | |
| 377 | +| Re-test after infra fix | `"test"` | ID of the build run to test | |
| 378 | +| Build only (no publish) | `"build"` | `$(Build.BuildId)` (default) | |
| 379 | +| Test + publish (skip build) | `"test,publish"` | ID of the build run | |
| 380 | + |
| 381 | +**In the Azure DevOps UI:** |
| 382 | + |
| 383 | +When you queue a new run, you can override these as runtime parameters: |
| 384 | +1. Set `stages` to the stage(s) you want to run |
| 385 | +2. Set `sourceBuildPipelineRunId` to the run ID containing the artifacts you need (find the build ID in the URL when viewing a pipeline run, e.g., `buildId=123456`) |
| 386 | + |
| 387 | +This avoids the multi-hour rebuild cycle when you just need to retry a failed operation. |
0 commit comments