[Hyperfleet 547]: Pull Secret Service DDR#89

Open

ldornele wants to merge 2 commits intoopenshift-hyperfleet:mainfrom

ldornele:HYPERFLEET-547

Contributor

ldornele commented Feb 16, 2026

This PR introduces the Pull Secret Service - Design Decision Record (DDR) documentation for HyperFleet.

ldornele added 2 commits

February 16, 2026 16:53


          HYPERFLEET-547: Pull Secret Service DDR

b2dd063


          HYPERFLEET-547: rename section 7 to Scalability

337e022

Contributor

coderabbitai bot commented Feb 16, 2026

Important

Review skipped

Review was skipped as selected files did not have any reviewable changes.

💤 Files selected but had no reviewable changes (1)

hyperfleet/docs/pull-secret-service-ddr.md

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

ciaranRoche reviewed

View reviewed changes

Contributor

ciaranRoche left a comment

Some work gone into this document, thanks for sticking with it and putting it through

hyperfleet/docs/pull-secret-service-ddr.md

    
              ## 2. Architecture Components

              > **🔗 Design Principles Applied:**

Contributor

ciaranRoche Feb 17, 2026

Should we include error recovery and partial failure handling. There is no discussion of what happens when external API calls fail. For example what if Quay succeeds but RHIT fails?

hyperfleet/docs/pull-secret-service-ddr.md

    
              ---

              #### 3.2.3 Trigger Credential Rotation

Contributor

ciaranRoche Feb 17, 2026

Can you add more details on credential rotation as it is not 100% clear. What is the dual credential period?
How long as both credentials valid?
How does the service know the new creds are being used by the cluster
What triggers deletion of old credentials?
Who is the actor that confirms the cluster is healthy with new creds?

hyperfleet/docs/pull-secret-service-ddr.md

    
              **Endpoint**: `POST /v1/clusters/{cluster_id}/pull-secrets`

              **Purpose**: Generate or retrieve pull secret for a cluster. Idempotent operation - returns existing credentials if already created.

Contributor

ciaranRoche Feb 17, 2026

Can you add some more information as to the how this is idempotent, what is the key? How are concurrent requests handled 🙏

hyperfleet/docs/pull-secret-service-ddr.md

    
                      EP4[POST /v1/clusters/ID/pull-secrets/rotations<br/>Trigger rotation]

                      EP5[GET /v1/clusters/ID/pull-secrets/rotations<br/>List rotations]

                      EP6[GET /v1/clusters/ID/pull-secrets/rotations/ID<br/>Get rotation status]

                      EP7[GET /v1/health<br/>Health check]

Contributor

ciaranRoche Feb 17, 2026

Following the standards in hyperfleet we have healthz and readyz

I also think it is worth mentioning what these actually check, liveness vs readiness!

hyperfleet/docs/pull-secret-service-ddr.md

    
                      EP5[GET /v1/clusters/ID/pull-secrets/rotations<br/>List rotations]

                      EP6[GET /v1/clusters/ID/pull-secrets/rotations/ID<br/>Get rotation status]

                      EP7[GET /v1/health<br/>Health check]

                      EP8[GET /v1/metrics<br/>Prometheus metrics]

Contributor

ciaranRoche Feb 17, 2026

Might be worth mentioning some metrics that are exposed, to help later on in determining SLI/SLO's for the service

hyperfleet/docs/pull-secret-service-ddr.md

    
              **Benefits**:

              - ✅ Single deployment to manage

              - ✅ Shared credential pool (cost efficient)

Contributor

ciaranRoche Feb 17, 2026

I see credential pool dotted around this document, can you add an explanation to what this pool is please

hyperfleet/docs/pull-secret-service-ddr.md

    
                  GeneratePullSecretRequest:

                    type: object

                    required:

                      - cluster_external_id

Contributor

ciaranRoche Feb 17, 2026

Is this correct, prob some claude stuff left over, seems cluster_id, cluster_external_ID and ClusterID get interchanged in this document.

hyperfleet/docs/pull-secret-service-ddr.md

    
              - ❌ **Pool Inefficiency**: Total pool size = N × high_water_mark (over-provisioning)

              - ❌ **Operational Overhead**: Must manage N instances (upgrades, monitoring, backups)

              **Cost Estimate (3 HyperFleet instances)**:

Contributor

ciaranRoche Feb 17, 2026

NP : we can probably remove the cost estimates from this document. Focus more on the use cases vs the cost

rafabene reviewed

View reviewed changes

hyperfleet/docs/pull-secret-service-ddr.md

    
              }

              ```

              **Response 400 Bad Request**:

Contributor

rafabene Feb 18, 2026 •

edited

Loading

The error responses bellow are inconsistent with the
https://github.com/openshift-hyperfleet/architecture/blob/main/hyperfleet/stan
dards/error-model.md

All error responses in this DDR should follow the standard format, e.g.:

  {
    "type": "https://api.hyperfleet.io/errors/resource-not-found",
    "title": "Resource Not Found",
    "status": 404,
    "detail": "Pull secret for cluster 'cls-123' not found",
    "instance": "/v1/clusters/cls-123/pull-secrets",
    "code": "HYPERFLEET-NTF-001",
    "timestamp": "2025-01-15T10:31:00.456Z",
    "trace_id": "5cf92f3577b34da6a3ce929d0e0e4737"
  }

rafabene reviewed

View reviewed changes

hyperfleet/docs/pull-secret-service-ddr.md

    
              ### 3.3 Error Handling

              All error responses follow RFC 7807 (Problem Details for HTTP APIs):

Contributor

rafabene Feb 18, 2026

Section 3.3 references RFC 7807, but HyperFleet uses RFC 9457 (the successor) with application/problem+json

rafabene reviewed

View reviewed changes

hyperfleet/docs/pull-secret-service-ddr.md

    
                redhat:

                  enabled: true

                  partner_code: "ocm-service"

Contributor

rafabene Feb 18, 2026

The RHIT registry configuration example uses partner_code: "ocm-service", which
directly contradicts Principle 6 (Dedicated Partner Code) defined earlier in this same
document.

Principle 6 explicitly states:
"HyperFleet must establish its own partner identity with dedicated partner code hyperfleet"
"Using ocm-service would create quota conflicts and billing confusion"

The configuration should be:

  rhit:
    enabled: true
    base_url: "https://registry.access.redhat.com/api/v1"
    partner_code: "hyperfleet"  # NOT "ocm-service" - see Principle 6
    cert_secret: "rhit-client-cert"

This looks like a copy-paste artifact from the AMS codebase analysis.

rafabene reviewed

View reviewed changes

hyperfleet/docs/pull-secret-service-ddr.md

    
              **Layer 3: Encryption at Rest** - *Data protected in storage*

              - **Purpose**: Encrypt tokens/passwords before saving to database

              - **Implementation**: PostgreSQL `pgcrypto` extension with AES-256-GCM encryption

Contributor

rafabene Feb 18, 2026

The document claims "AES-256-GCM via pgcrypto" encryption (line ~554), but the code examples
(line ~611) use pgp_sym_encrypt / pgp_sym_decrypt, which is OpenPGP symmetric encryption, not
AES-256-GCM.

Key issues:

Wrong algorithm: pgcrypto's pgp_sym_encrypt defaults to AES-128 in CFB mode, not
AES-256-GCM. These are different algorithms with different security properties (GCM provides
authenticated encryption; CFB does not).
Key exposure risk: Using pgp_sym_encrypt means the encryption key is sent to PostgreSQL in
every SQL query. This key can appear in:
- PostgreSQL query logs (log_statement = 'all')
- pg_stat_statements view
- pg_stat_activity during execution
- Database audit logs
No key rotation strategy: The document mentions 90-day key rotation but doesn't discuss how existing rows would be re-encrypted with a new key.

For a credential management service, consider application-level encryption (encrypt in Go
before DB insert) using crypto/aes with GCM mode. This keeps the key out of database queries
entirely and matches the claimed algorithm

rafabene reviewed

View reviewed changes

hyperfleet/docs/pull-secret-service-ddr.md

    
                          Pod3[Pod 3<br/>API :8080<br/>Metrics :8081]

                      end

                      Service[Service: pull-secret-service<br/>Type: ClusterIP<br/>Ports: 80 API, 8081 Metrics]

Contributor

rafabene Feb 18, 2026 •

edited

Loading

Per the https://github.com/openshift-hyperfleet/architecture/blob/main/hyperfleet/standards/health-endpoints.md and https://github.com/openshift-hyperfleet/architecture/blob/main/hyperfleet/standards/metrics.md should adopt the standard port 9090 for metrics

rafabene reviewed

View reviewed changes

hyperfleet/docs/pull-secret-service-ddr.md

    
                      cluster_external_id:

                        type: string

                        example: cls-abc123def456

                      cloud_provider:

Contributor

rafabene Feb 18, 2026

This contradicts Principle 2 (Cloud Agnostic), which states:
"No cloud-specific parameters in API contract"

It creates cloud-specific coupling in what should be a cloud-agnostic service
Adding a new cloud provider requires an API schema change (new enum value)
The caller (adapter) shouldn't need to tell the service which cloud it runs on — the
service shouldn't care

rafabene reviewed

View reviewed changes

hyperfleet/docs/pull-secret-service-ddr.md

    
                  style RHIT fill:#f3e5f5,stroke:#4a148c,stroke-width:2px

              ```

              ### 4.4 Kubernetes Deployment Topology (Both Options)

Contributor

rafabene Feb 18, 2026

Section "4.4 Kubernetes Deployment Topology (Both Options)" appears twice

rafabene reviewed

View reviewed changes

hyperfleet/docs/pull-secret-service-ddr.md

    
              The Pull Secret Service exposes a **RESTful API** for credential lifecycle management. All endpoints are authenticated using Kubernetes ServiceAccount tokens and follow standard HTTP semantics.

              ### 3.1 API Overview

Contributor

rafabene Feb 18, 2026

The API Overview diagram lists 8 endpoints, but only 5 are specified in the "Core
Endpoints" section. Two endpoints are missing their detailed specification:

1. DELETE /v1/clusters/{cluster_id}/pull-secrets

No documentation for:

Expected behavior (soft delete? hard delete? cascade to rotations?)
What happens to active/pending rotations when credentials are deleted
Response body and status codes (200 vs 204?)
Whether it revokes credentials at the registry level (Quay robot account deletion, RHIT SA
removal) or only deletes the local record

2. GET /v1/clusters/{cluster_id}/pull-secrets/rotations

No documentation for:

Pagination strategy (offset-based? cursor-based?)
Filter/query parameters (by status? by date range?)
Response format and sort order
Whether it returns all historical rotations or only recent ones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet