Skip to content

Commit f621eff

Browse files
Update common Docker engineering infrastructure with latest
1 parent f5789ab commit f621eff

File tree

3 files changed

+391
-2
lines changed

3 files changed

+391
-2
lines changed

eng/docker-tools/DEV-GUIDE.md

Lines changed: 387 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,387 @@
1+
# Developer Guide: Using the docker-tools Infrastructure
2+
3+
This guide walks you through the practical scenarios and workflows for using the docker-tools infrastructure. The `eng/docker-tools` directory is a **shared infrastructure layer** used across all .NET Docker repositories (dotnet-docker, dotnet-buildtools-prereqs-docker, dotnet-framework-docker). It solves a fundamental challenge: building, testing, and publishing Docker images across multiple operating systems (Alpine, Ubuntu, Azure Linux, Windows Server variants), multiple CPU architectures (amd64, arm64, arm32), and multiple .NET versions—all while maintaining consistency and reliability.
4+
5+
At its core, the infrastructure provides:
6+
7+
- **PowerShell scripts** for local image building and Docker operations—so you can test Dockerfile changes on your machine before committing
8+
- **Azure Pipelines templates** for CI/CD (build, test, publish)—a composable template system that orchestrates builds across dozens of OS/architecture combinations in parallel
9+
- **ImageBuilder orchestration**—a specialized .NET tool that understands manifest files, manages image dependencies, handles multi-arch manifest creation, and coordinates the entire build process
10+
- **Caching and optimization**—intelligent systems that skip unchanged images and minimize redundant work
11+
- **SBOM generation**—automatic Software Bill of Materials creation for supply chain security
12+
13+
The infrastructure handles complexity that would otherwise be overwhelming: a single commit to a repo can trigger builds of hundreds of image variants across Linux and Windows agents, each requiring proper build sequencing, testing, and eventual publication to Microsoft Artifact Registry (MAR).
14+
15+
**Important:** Files in `eng/docker-tools/` are synchronized across repositories by automation in the [dotnet/docker-tools](https://github.com/dotnet/docker-tools) repository. If you need to make changes to this infrastructure, submit them there—changes made directly in consuming repos will be overwritten.
16+
17+
---
18+
19+
## Local Development Scenarios
20+
21+
### Scenario: Building Docker Images Locally
22+
23+
The most common local task is building images to test Dockerfile changes before pushing.
24+
25+
**Quick Build - All Images:**
26+
```powershell
27+
./eng/docker-tools/build.ps1
28+
```
29+
30+
**Filter by OS:**
31+
```powershell
32+
# Build only Alpine images
33+
./eng/docker-tools/build.ps1 -OS "alpine"
34+
35+
# Build Ubuntu 24.04 images
36+
./eng/docker-tools/build.ps1 -OS "noble"
37+
```
38+
39+
**Filter by Architecture:**
40+
```powershell
41+
# Build arm64 images only
42+
./eng/docker-tools/build.ps1 -Architecture "arm64"
43+
```
44+
45+
**Filter by Path:**
46+
```powershell
47+
# Build images from a specific directory
48+
./eng/docker-tools/build.ps1 -Paths "src/runtime/8.0/alpine3.20"
49+
50+
# Build all 8.0 runtime images using glob pattern
51+
./eng/docker-tools/build.ps1 -Paths "*runtime*8.0*"
52+
```
53+
54+
**Combine Filters:**
55+
```powershell
56+
# Build .NET 8.0 Alpine arm64 images
57+
./eng/docker-tools/build.ps1 -Version "8.0" -OS "alpine" -Architecture "arm64"
58+
```
59+
60+
**Filter by Product Version (if applicable):**
61+
```powershell
62+
# Build only .NET 8.0 images
63+
./eng/docker-tools/build.ps1 -Version "8.0"
64+
65+
# Build .NET 6.0 and 8.0 images
66+
./eng/docker-tools/build.ps1 -Version "6.0","8.0"
67+
```
68+
69+
### Understanding What Happens Under the Hood
70+
71+
When you run [`build.ps1`](build.ps1), here's the chain of execution:
72+
73+
```
74+
build.ps1
75+
76+
├── Translates your filter parameters into ImageBuilder CLI args
77+
78+
└── Calls Invoke-ImageBuilder.ps1 "build --version X --os-version Y ..."
79+
80+
├── On Linux: Runs ImageBuilder in a Docker container
81+
│ └── Builds image: microsoft-dotnet-imagebuilder-withrepo
82+
│ └── Mounts Docker socket and repo contents
83+
84+
└── On Windows: Extracts ImageBuilder locally (due to Docker-in-Docker limitations)
85+
└── Runs Microsoft.DotNet.ImageBuilder.exe directly
86+
```
87+
88+
### Scenario: Running ImageBuilder Directly
89+
90+
For advanced scenarios, you may want to invoke ImageBuilder with specific commands:
91+
92+
```powershell
93+
# Run any ImageBuilder command
94+
./eng/docker-tools/Invoke-ImageBuilder.ps1 "build --help"
95+
96+
# Generate the build matrix (useful for debugging pipeline behavior)
97+
./eng/docker-tools/Invoke-ImageBuilder.ps1 "generateBuildMatrix --manifest manifest.json --type platformDependencyGraph"
98+
99+
# Validate manifest syntax
100+
./eng/docker-tools/Invoke-ImageBuilder.ps1 "validateManifest --manifest manifest.json"
101+
```
102+
103+
---
104+
105+
## Understanding the Pipeline Architecture
106+
107+
### The Build Flow
108+
109+
The pipeline behaves differently depending on the build context:
110+
111+
**Public PR Builds**:
112+
```
113+
Build Stage
114+
├── PreBuildValidation
115+
├── GenerateBuildMatrix
116+
└── Build Jobs (dry-run, no push)
117+
└── Inline tests after each build
118+
119+
120+
Post_Build Stage
121+
└── Merge artifacts
122+
123+
124+
Publish Stage (dry-run)
125+
└── All publish operations run but skip actual pushes
126+
127+
128+
(end)
129+
```
130+
- Images are built but **not pushed** to any registry
131+
- Tests run inline within each build job
132+
- Publish stage runs in dry-run mode (validates publish logic without pushing)
133+
- Validates that Dockerfiles build successfully
134+
135+
**Internal Official Builds**:
136+
```
137+
Build Stage
138+
├── PreBuildValidation
139+
├── CopyBaseImages → staging ACR
140+
├── GenerateBuildMatrix
141+
└── Build Jobs (push to staging ACR)
142+
143+
144+
Post_Build Stage
145+
├── Merge image info files
146+
└── Consolidate SBOMs
147+
148+
149+
Test Stage
150+
├── GenerateTestMatrix
151+
└── Test Jobs
152+
153+
154+
Publish Stage
155+
├── Copy images to production ACR
156+
├── Create multi-arch manifests
157+
├── Wait for MAR ingestion
158+
├── Update READMEs
159+
├── Publish image info to versions repo
160+
└── Apply EOL annotations
161+
```
162+
- Full pipeline with all stages
163+
- Images flow: `buildAcr``publishAcr` → MAR (see [`publish-config-prod.yml`](templates/stages/dotnet/publish-config-prod.yml) for ACR definitions)
164+
- Tests run against staged images
165+
- Only successful builds get published
166+
167+
### Build Matrix Generation
168+
169+
The `generateBuildMatrix` command is key to understanding how builds are parallelized. It:
170+
171+
1. **Reads the manifest.json** - Understands which images exist
172+
2. **Builds a dependency graph** - Knows that `runtime-deps` must build before `runtime`
173+
3. **Groups by platform** - Creates jobs for each OS/Architecture combo
174+
4. **Optimizes with caching** - Can detect and exclude unchanged images
175+
176+
### Controlling Which Build Stages Run
177+
178+
The `stages` variable is a comma-separated string that controls which pipeline stages execute:
179+
180+
```yaml
181+
variables:
182+
- name: stages
183+
value: "build,test,publish" # Run all stages
184+
```
185+
186+
Common patterns:
187+
- `"build"` - Build only, no tests or publishing
188+
- `"build,test"` - Build and test, but don't publish
189+
- `"publish"` - Publish only (when re-running a failed publish from a previous build)
190+
- `"build,test,publish"` - Full pipeline
191+
192+
**Note:** The `Post_Build` stage is implicitly included whenever `build` is in the stages list. You don't need to specify it separately—it automatically runs after Build to merge image info files and consolidate SBOMs.
193+
194+
The stages variable is useful for:
195+
- Re-running just the publish stage after fixing a transient failure
196+
- Skipping tests during initial development
197+
- Running isolated stages for debugging
198+
199+
### Image Info Files: The Build's Memory
200+
201+
Image info files (defined by [`ImageArtifactDetails`](https://github.com/dotnet/docker-tools/blob/main/src/ImageBuilder/Models/Image/ImageArtifactDetails.cs)) are the mechanism that tracks what was built:
202+
203+
```json
204+
{
205+
"repos": [{
206+
"repo": "dotnet/runtime",
207+
"images": [{
208+
"platforms": [{
209+
"dockerfile": "src/runtime/8.0/alpine3.20/amd64/Dockerfile",
210+
"digest": "sha256:abc123...",
211+
"created": "2024-01-15T10:30:00Z",
212+
"commitUrl": "https://github.com/dotnet/dotnet-docker/commit/..."
213+
}]
214+
}]
215+
}]
216+
}
217+
```
218+
219+
**How they flow through the pipeline:**
220+
1. **Build stage**: Each build job produces an image-info fragment
221+
2. **Post_Build stage**: Fragments are merged into a single `image-info.json`
222+
3. **Test stage**: Uses merged info to know which images to test
223+
4. **Publish stage**: Uses info to know which images to copy/publish
224+
5. **Versions repo**: Final info is committed to the versions repo
225+
226+
The [versions repo](https://github.com/dotnet/versions) stores the "source of truth" image info. Future builds compare against this to determine what's changed and skip unchanged images.
227+
228+
**Using Image Info for Investigations**
229+
230+
Image info files are invaluable when you need to track down information about a specific image, particularly when starting from a digest reported by a customer or security scan.
231+
232+
*Scenario: "Which commit produced this image?"*
233+
234+
Given a digest like `sha256:abc123...`, you can trace it back to its source:
235+
236+
1. **Check the versions repo history** - The `dotnet/versions` repo contains historical image info committed after each publish. Use `git log -p --all -S 'sha256:abc123'` to find the commit that introduced this digest.
237+
238+
2. **From the image info entry**, you'll find:
239+
- `commitUrl` - The exact source commit that built this image
240+
- `dockerfile` - Which Dockerfile produced it
241+
- `created` - When it was built
242+
- `simpleTags` - The tags applied to this image
243+
244+
*Scenario: "What was in the last successful build?"*
245+
246+
Download the `image-info` artifact from a pipeline run in Azure DevOps:
247+
1. Navigate to the pipeline run
248+
2. Go to the "Published" artifacts section
249+
3. Download `image-info` (merged) or individual `*-image-info-*` fragments
250+
251+
*Scenario: "When did we last publish updates to a specific image?"*
252+
253+
Use the versions repo git history:
254+
```bash
255+
# In the dotnet/versions repo
256+
git log --oneline -- build-info/docker/image-info.dotnet-dotnet-docker-main.json
257+
```
258+
259+
Each commit corresponds to a publish operation and includes the full image info at that point in time.
260+
261+
*Scenario: "Compare what changed between two publishes"*
262+
263+
```bash
264+
git diff <commit1> <commit2> -- build-info/docker/image-info.dotnet-dotnet-docker-main.json
265+
```
266+
267+
This shows which images were added, removed, or rebuilt (new digests) between the two publishes.
268+
269+
### The Publish Flow in Detail
270+
271+
The publish stage does more than just push images. Here's the sequence:
272+
273+
1. **Copy Images** — `copyAcrImages` copies from build ACR to publish ACR
274+
2. **Publish Manifest** — `publishManifest` creates multi-arch manifest lists
275+
3. **Wait for MAR Ingestion** — Polls MAR until images are available (timeout configurable)
276+
4. **Publish READMEs** — Updates documentation in the registry
277+
5. **Wait for Doc Ingestion** — Ensures README changes are live
278+
6. **Merge & Publish Image Info** — Updates the versions repo with new image metadata
279+
7. **Ingest Kusto Image Info** — Sends telemetry to Kusto for analytics
280+
8. **Generate & Apply EOL Annotations** — Marks images with end-of-life dates
281+
9. **Post Publish Notification** — Creates GitHub issues/notifications about the publish
282+
283+
### Dry-Run Mode
284+
285+
For testing pipeline changes without actually publishing:
286+
287+
```yaml
288+
# In pipeline variables or at runtime
289+
variables:
290+
- name: dryRunArg
291+
value: "--dry-run"
292+
```
293+
294+
Or the infrastructure automatically enables dry-run for:
295+
- Pull request builds
296+
- Builds from non-official branches
297+
- Public project builds
298+
299+
The [`set-dry-run.yml`](templates/steps/set-dry-run.yml) step template determines this automatically based on context.
300+
301+
---
302+
303+
## Automatic Image Rebuilds
304+
305+
The infrastructure includes automation that monitors for base image updates and triggers rebuilds when dependencies change.
306+
307+
### How It Works
308+
309+
A scheduled pipeline ([`check-base-image-updates.yml`](https://github.com/dotnet/docker-tools/blob/main/eng/pipelines/check-base-image-updates.yml)) runs every 4 hours and:
310+
311+
1. **Checks for stale images** — Compares the base image digests used in our published images against the current digests in upstream registries
312+
2. **Identifies affected images** — Determines which of our images need rebuilding because their base image changed
313+
3. **Queues targeted builds** — Automatically triggers builds for only the affected images, not the entire repo
314+
315+
This ensures that security patches and updates in base images (like `alpine`, `ubuntu`, `mcr.microsoft.com/windows/nanoserver`) flow through to images without manual intervention.
316+
317+
### Failure Handling and Recovery
318+
319+
The system has built-in retry logic but requires manual intervention after repeated failures:
320+
321+
**Automatic retry behavior:**
322+
- If a triggered build fails, the system will attempt to rebuild every 4 hours
323+
- After **3 unsuccessful attempts**, the system stops queuing new builds for that image
324+
- This prevents endless rebuild loops when there's a genuine issue requiring human attention
325+
326+
**After fixing the issue:**
327+
328+
Once you've fixed the underlying problem (Dockerfile change, test fix, etc.) and have a successful build:
329+
330+
1. Navigate to the successful pipeline run in Azure DevOps
331+
2. Add the `autobuilder` label to that run
332+
3. This signals to the infrastructure that a successful build has occurred
333+
4. The system will resume automatic rebuilds for that image as needed
334+
335+
The `autobuilder` label is how the infrastructure tracks that the failure cycle has been broken and normal operations can resume.
336+
337+
---
338+
339+
## Common Customization Patterns
340+
341+
### Pattern: Adding Build Arguments
342+
343+
Pass additional arguments to Docker builds via ImageBuilder:
344+
345+
```yaml
346+
customBuildInitSteps:
347+
- powershell: |
348+
$args = "--build-arg MY_VAR=value"
349+
echo "##vso[task.setvariable variable=imageBuilderBuildArgs]$args"
350+
```
351+
352+
### Pattern: Re-running Stages with `stages` and `sourceBuildPipelineRunId`
353+
354+
A powerful pattern is combining the `stages` variable with the `sourceBuildPipelineRunId` pipeline parameter to run specific stages using artifacts from a previous build. This is useful for:
355+
1. Skipping stages you don't need to run
356+
2. Avoiding unnecessary re-builds after test/publishing infrastructure fixes
357+
358+
Note: For simple retries of failed jobs, use the Azure Pipelines UI "Re-run failed jobs" feature instead.
359+
360+
**Scenario: Test failed, need to run publish anyway**
361+
362+
* Set `sourceBuildPipelineRunId` to the build which built the images
363+
* Set `stages` to `publish`
364+
365+
**How it works:**
366+
367+
1. `sourceBuildPipelineRunId` tells the pipeline which previous run to pull artifacts from
368+
2. The [`download-build-artifact.yml`](templates/steps/download-build-artifact.yml) step uses this ID to fetch `image-info.json` from that run
369+
3. Specified stage(s) use the downloaded image info to know which images exist
370+
371+
**Common recovery patterns:**
372+
373+
| Scenario | stages | sourceBuildPipelineRunId |
374+
|----------|--------|--------------------------|
375+
| Normal full build | `"build,test,publish"` | `$(Build.BuildId)` (default) |
376+
| Re-run publish after infra fix | `"publish"` | ID of the successful build run |
377+
| Re-test after infra fix | `"test"` | ID of the build run to test |
378+
| Build only (no publish) | `"build"` | `$(Build.BuildId)` (default) |
379+
| Test + publish (skip build) | `"test,publish"` | ID of the build run |
380+
381+
**In the Azure DevOps UI:**
382+
383+
When you queue a new run, you can override these as runtime parameters:
384+
1. Set `stages` to the stage(s) you want to run
385+
2. Set `sourceBuildPipelineRunId` to the run ID containing the artifacts you need (find the build ID in the URL when viewing a pipeline run, e.g., `buildId=123456`)
386+
387+
This avoids the multi-hour rebuild cycle when you just need to retry a failed operation.

eng/docker-tools/readme.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,6 @@
2525

2626
!!! Changes made in this directory are subject to being overwritten by automation !!!
2727

28-
The files in this directory are shared by all .NET Docker repos. If you need to make changes to these files, open an issue or submit a pull request in https://github.com/dotnet/docker-tools.
28+
The files in this directory are shared by all .NET Docker repos. If you need to make changes to these files, open an issue or submit a pull request in https://github.com/dotnet/docker-tools.
29+
30+
For guidance on using this infrastructure, see the [Developer Guide](DEV-GUIDE.md).

0 commit comments

Comments
 (0)