Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
387 changes: 387 additions & 0 deletions eng/docker-tools/DEV-GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
# Developer Guide: Using the docker-tools Infrastructure

This guide walks you through the practical scenarios and workflows for using the docker-tools infrastructure. The `eng/docker-tools` directory is a **shared infrastructure layer** used across all .NET Docker repositories (dotnet-docker, dotnet-buildtools-prereqs-docker, dotnet-framework-docker). It solves a fundamental challenge: building, testing, and publishing Docker images across multiple operating systems (Alpine, Ubuntu, Azure Linux, Windows Server variants), multiple CPU architectures (amd64, arm64, arm32), and multiple .NET versions—all while maintaining consistency and reliability.

At its core, the infrastructure provides:

- **PowerShell scripts** for local image building and Docker operations—so you can test Dockerfile changes on your machine before committing
- **Azure Pipelines templates** for CI/CD (build, test, publish)—a composable template system that orchestrates builds across dozens of OS/architecture combinations in parallel
- **ImageBuilder orchestration**—a specialized .NET tool that understands manifest files, manages image dependencies, handles multi-arch manifest creation, and coordinates the entire build process
- **Caching and optimization**—intelligent systems that skip unchanged images and minimize redundant work
- **SBOM generation**—automatic Software Bill of Materials creation for supply chain security

The infrastructure handles complexity that would otherwise be overwhelming: a single commit to a repo can trigger builds of hundreds of image variants across Linux and Windows agents, each requiring proper build sequencing, testing, and eventual publication to Microsoft Artifact Registry (MAR).

**Important:** Files in `eng/docker-tools/` are synchronized across repositories by automation in the [dotnet/docker-tools](https://github.com/dotnet/docker-tools) repository. If you need to make changes to this infrastructure, submit them there—changes made directly in consuming repos will be overwritten.

---

## Local Development Scenarios

### Scenario: Building Docker Images Locally

The most common local task is building images to test Dockerfile changes before pushing.

**Quick Build - All Images:**
```powershell
./eng/docker-tools/build.ps1
```

**Filter by OS:**
```powershell
# Build only Alpine images
./eng/docker-tools/build.ps1 -OS "alpine"

# Build Ubuntu 24.04 images
./eng/docker-tools/build.ps1 -OS "noble"
```

**Filter by Architecture:**
```powershell
# Build arm64 images only
./eng/docker-tools/build.ps1 -Architecture "arm64"
```

**Filter by Path:**
```powershell
# Build images from a specific directory
./eng/docker-tools/build.ps1 -Paths "src/runtime/8.0/alpine3.20"

# Build all 8.0 runtime images using glob pattern
./eng/docker-tools/build.ps1 -Paths "*runtime*8.0*"
```

**Combine Filters:**
```powershell
# Build .NET 8.0 Alpine arm64 images
./eng/docker-tools/build.ps1 -Version "8.0" -OS "alpine" -Architecture "arm64"
```

**Filter by Product Version (if applicable):**
```powershell
# Build only .NET 8.0 images
./eng/docker-tools/build.ps1 -Version "8.0"

# Build .NET 6.0 and 8.0 images
./eng/docker-tools/build.ps1 -Version "6.0","8.0"
```

### Understanding What Happens Under the Hood

When you run [`build.ps1`](build.ps1), here's the chain of execution:

```
build.ps1
├── Translates your filter parameters into ImageBuilder CLI args
└── Calls Invoke-ImageBuilder.ps1 "build --version X --os-version Y ..."
├── On Linux: Runs ImageBuilder in a Docker container
│ └── Builds image: microsoft-dotnet-imagebuilder-withrepo
│ └── Mounts Docker socket and repo contents
└── On Windows: Extracts ImageBuilder locally (due to Docker-in-Docker limitations)
└── Runs Microsoft.DotNet.ImageBuilder.exe directly
```

### Scenario: Running ImageBuilder Directly

For advanced scenarios, you may want to invoke ImageBuilder with specific commands:

```powershell
# Run any ImageBuilder command
./eng/docker-tools/Invoke-ImageBuilder.ps1 "build --help"

# Generate the build matrix (useful for debugging pipeline behavior)
./eng/docker-tools/Invoke-ImageBuilder.ps1 "generateBuildMatrix --manifest manifest.json --type platformDependencyGraph"

# Validate manifest syntax
./eng/docker-tools/Invoke-ImageBuilder.ps1 "validateManifest --manifest manifest.json"
```

---

## Understanding the Pipeline Architecture

### The Build Flow

The pipeline behaves differently depending on the build context:

**Public PR Builds**:
```
Build Stage
├── PreBuildValidation
├── GenerateBuildMatrix
└── Build Jobs (dry-run, no push)
└── Inline tests after each build
Post_Build Stage
└── Merge artifacts
Publish Stage (dry-run)
└── All publish operations run but skip actual pushes
(end)
```
- Images are built but **not pushed** to any registry
- Tests run inline within each build job
- Publish stage runs in dry-run mode (validates publish logic without pushing)
- Validates that Dockerfiles build successfully

**Internal Official Builds**:
```
Build Stage
├── PreBuildValidation
├── CopyBaseImages → staging ACR
├── GenerateBuildMatrix
└── Build Jobs (push to staging ACR)
Post_Build Stage
├── Merge image info files
└── Consolidate SBOMs
Test Stage
├── GenerateTestMatrix
└── Test Jobs
Publish Stage
├── Copy images to production ACR
├── Create multi-arch manifests
├── Wait for MAR ingestion
├── Update READMEs
├── Publish image info to versions repo
└── Apply EOL annotations
```
- Full pipeline with all stages
- Images flow: `buildAcr` → `publishAcr` → MAR (see [`publish-config-prod.yml`](templates/stages/dotnet/publish-config-prod.yml) for ACR definitions)
- Tests run against staged images
- Only successful builds get published

### Build Matrix Generation

The `generateBuildMatrix` command is key to understanding how builds are parallelized. It:

1. **Reads the manifest.json** - Understands which images exist
2. **Builds a dependency graph** - Knows that `runtime-deps` must build before `runtime`
3. **Groups by platform** - Creates jobs for each OS/Architecture combo
4. **Optimizes with caching** - Can detect and exclude unchanged images

### Controlling Which Build Stages Run

The `stages` variable is a comma-separated string that controls which pipeline stages execute:

```yaml
variables:
- name: stages
value: "build,test,publish" # Run all stages
```

Common patterns:
- `"build"` - Build only, no tests or publishing
- `"build,test"` - Build and test, but don't publish
- `"publish"` - Publish only (when re-running a failed publish from a previous build)
- `"build,test,publish"` - Full pipeline

**Note:** The `Post_Build` stage is implicitly included whenever `build` is in the stages list. You don't need to specify it separately—it automatically runs after Build to merge image info files and consolidate SBOMs.

The stages variable is useful for:
- Re-running just the publish stage after fixing a transient failure
- Skipping tests during initial development
- Running isolated stages for debugging

### Image Info Files: The Build's Memory

Image info files (defined by [`ImageArtifactDetails`](https://github.com/dotnet/docker-tools/blob/main/src/ImageBuilder/Models/Image/ImageArtifactDetails.cs)) are the mechanism that tracks what was built:

```json
{
"repos": [{
"repo": "dotnet/runtime",
"images": [{
"platforms": [{
"dockerfile": "src/runtime/8.0/alpine3.20/amd64/Dockerfile",
"digest": "sha256:abc123...",
"created": "2024-01-15T10:30:00Z",
"commitUrl": "https://github.com/dotnet/dotnet-docker/commit/..."
}]
}]
}]
}
```

**How they flow through the pipeline:**
1. **Build stage**: Each build job produces an image-info fragment
2. **Post_Build stage**: Fragments are merged into a single `image-info.json`
3. **Test stage**: Uses merged info to know which images to test
4. **Publish stage**: Uses info to know which images to copy/publish
5. **Versions repo**: Final info is committed to the versions repo

The [versions repo](https://github.com/dotnet/versions) stores the "source of truth" image info. Future builds compare against this to determine what's changed and skip unchanged images.

**Using Image Info for Investigations**

Image info files are invaluable when you need to track down information about a specific image, particularly when starting from a digest reported by a customer or security scan.

*Scenario: "Which commit produced this image?"*

Given a digest like `sha256:abc123...`, you can trace it back to its source:

1. **Check the versions repo history** - The `dotnet/versions` repo contains historical image info committed after each publish. Use `git log -p --all -S 'sha256:abc123'` to find the commit that introduced this digest.

2. **From the image info entry**, you'll find:
- `commitUrl` - The exact source commit that built this image
- `dockerfile` - Which Dockerfile produced it
- `created` - When it was built
- `simpleTags` - The tags applied to this image

*Scenario: "What was in the last successful build?"*

Download the `image-info` artifact from a pipeline run in Azure DevOps:
1. Navigate to the pipeline run
2. Go to the "Published" artifacts section
3. Download `image-info` (merged) or individual `*-image-info-*` fragments

*Scenario: "When did we last publish updates to a specific image?"*

Use the versions repo git history:
```bash
# In the dotnet/versions repo
git log --oneline -- build-info/docker/image-info.dotnet-dotnet-docker-main.json
```

Each commit corresponds to a publish operation and includes the full image info at that point in time.

*Scenario: "Compare what changed between two publishes"*

```bash
git diff <commit1> <commit2> -- build-info/docker/image-info.dotnet-dotnet-docker-main.json
```

This shows which images were added, removed, or rebuilt (new digests) between the two publishes.

### The Publish Flow in Detail

The publish stage does more than just push images. Here's the sequence:

1. **Copy Images** — `copyAcrImages` copies from build ACR to publish ACR
2. **Publish Manifest** — `publishManifest` creates multi-arch manifest lists
3. **Wait for MAR Ingestion** — Polls MAR until images are available (timeout configurable)
4. **Publish READMEs** — Updates documentation in the registry
5. **Wait for Doc Ingestion** — Ensures README changes are live
6. **Merge & Publish Image Info** — Updates the versions repo with new image metadata
7. **Ingest Kusto Image Info** — Sends telemetry to Kusto for analytics
8. **Generate & Apply EOL Annotations** — Marks images with end-of-life dates
9. **Post Publish Notification** — Creates GitHub issues/notifications about the publish

### Dry-Run Mode

For testing pipeline changes without actually publishing:

```yaml
# In pipeline variables or at runtime
variables:
- name: dryRunArg
value: "--dry-run"
```

Or the infrastructure automatically enables dry-run for:
- Pull request builds
- Builds from non-official branches
- Public project builds

The [`set-dry-run.yml`](templates/steps/set-dry-run.yml) step template determines this automatically based on context.

---

## Automatic Image Rebuilds

The infrastructure includes automation that monitors for base image updates and triggers rebuilds when dependencies change.

### How It Works

A scheduled pipeline ([`check-base-image-updates.yml`](https://github.com/dotnet/docker-tools/blob/main/eng/pipelines/check-base-image-updates.yml)) runs every 4 hours and:

1. **Checks for stale images** — Compares the base image digests used in our published images against the current digests in upstream registries
2. **Identifies affected images** — Determines which of our images need rebuilding because their base image changed
3. **Queues targeted builds** — Automatically triggers builds for only the affected images, not the entire repo

This ensures that security patches and updates in base images (like `alpine`, `ubuntu`, `mcr.microsoft.com/windows/nanoserver`) flow through to images without manual intervention.

### Failure Handling and Recovery

The system has built-in retry logic but requires manual intervention after repeated failures:

**Automatic retry behavior:**
- If a triggered build fails, the system will attempt to rebuild every 4 hours
- After **3 unsuccessful attempts**, the system stops queuing new builds for that image
- This prevents endless rebuild loops when there's a genuine issue requiring human attention

**After fixing the issue:**

Once you've fixed the underlying problem (Dockerfile change, test fix, etc.) and have a successful build:

1. Navigate to the successful pipeline run in Azure DevOps
2. Add the `autobuilder` label to that run
3. This signals to the infrastructure that a successful build has occurred
4. The system will resume automatic rebuilds for that image as needed

The `autobuilder` label is how the infrastructure tracks that the failure cycle has been broken and normal operations can resume.

---

## Common Customization Patterns

### Pattern: Adding Build Arguments

Pass additional arguments to Docker builds via ImageBuilder:

```yaml
customBuildInitSteps:
- powershell: |
$args = "--build-arg MY_VAR=value"
echo "##vso[task.setvariable variable=imageBuilderBuildArgs]$args"
```

### Pattern: Re-running Stages with `stages` and `sourceBuildPipelineRunId`

A powerful pattern is combining the `stages` variable with the `sourceBuildPipelineRunId` pipeline parameter to run specific stages using artifacts from a previous build. This is useful for:
1. Skipping stages you don't need to run
2. Avoiding unnecessary re-builds after test/publishing infrastructure fixes

Note: For simple retries of failed jobs, use the Azure Pipelines UI "Re-run failed jobs" feature instead.

**Scenario: Test failed, need to run publish anyway**

* Set `sourceBuildPipelineRunId` to the build which built the images
* Set `stages` to `publish`

**How it works:**

1. `sourceBuildPipelineRunId` tells the pipeline which previous run to pull artifacts from
2. The [`download-build-artifact.yml`](templates/steps/download-build-artifact.yml) step uses this ID to fetch `image-info.json` from that run
3. Specified stage(s) use the downloaded image info to know which images exist

**Common recovery patterns:**

| Scenario | stages | sourceBuildPipelineRunId |
|----------|--------|--------------------------|
| Normal full build | `"build,test,publish"` | `$(Build.BuildId)` (default) |
| Re-run publish after infra fix | `"publish"` | ID of the successful build run |
| Re-test after infra fix | `"test"` | ID of the build run to test |
| Build only (no publish) | `"build"` | `$(Build.BuildId)` (default) |
| Test + publish (skip build) | `"test,publish"` | ID of the build run |

**In the Azure DevOps UI:**

When you queue a new run, you can override these as runtime parameters:
1. Set `stages` to the stage(s) you want to run
2. Set `sourceBuildPipelineRunId` to the run ID containing the artifacts you need (find the build ID in the URL when viewing a pipeline run, e.g., `buildId=123456`)

This avoids the multi-hour rebuild cycle when you just need to retry a failed operation.
4 changes: 3 additions & 1 deletion eng/docker-tools/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,6 @@

!!! Changes made in this directory are subject to being overwritten by automation !!!

The files in this directory are shared by all .NET Docker repos. If you need to make changes to these files, open an issue or submit a pull request in https://github.com/dotnet/docker-tools.
The files in this directory are shared by all .NET Docker repos. If you need to make changes to these files, open an issue or submit a pull request in https://github.com/dotnet/docker-tools.

For guidance on using this infrastructure, see the [Developer Guide](DEV-GUIDE.md).
Loading
Loading