Skip to content

Conversation

@danielmellado
Copy link
Contributor

@danielmellado danielmellado commented Dec 17, 2025

This commit introduces a new API to be introduced as a part of the
migration in CMO from ConfigMap to CRDs

Signed-off-by: Mario Fernandez mariofer@redhat.com
Signed-off-by: Daniel Mellado dmellado@fedoraproject.org

@openshift-ci-robot
Copy link

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

Hello @danielmellado! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@coderabbitai
Copy link

coderabbitai bot commented Dec 17, 2025

Walkthrough

Adds a new spec.prometheusConfig to ClusterMonitoring with extensive Prometheus/Alertmanager configuration, refactors volumeClaimTemplate to a simplified size/storage schema, and introduces many new types plus corresponding generated deepcopy, OpenAPI, Swagger, CRD, and test updates.

Changes

Cohort / File(s) Summary
Type Definitions
config/v1alpha1/types_cluster_monitoring.go
Introduces PrometheusConfig and many supporting types/constants (e.g., AdditionalAlertmanagerConfig, RemoteWriteSpec, RelabelConfig, TLSConfig, AuthorizationConfig, SecretKeySelector, VolumeClaimConfig, Retention, Label, RelabelAction, CollectionProfile, etc.), adds PrometheusConfig to ClusterMonitoringSpec, and replaces PVC pointer usage with VolumeClaimConfig plus validation tags.
CRD Manifests & Featuregated Manifests
config/v1alpha1/zz_generated.crd-manifests/...clustermonitorings-*.crd.yaml, config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml, payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-*.crd.yaml
Adds spec.prometheusConfig schema (additionalAlertmanagerConfigs, collectionProfile, enforcedBodySizeLimitBytes, externalLabels, logLevel, nodeSelector, queryLogFile, remoteWrite, resources, retention, tolerations, topologySpreadConstraints, volumeClaimTemplate, etc.). Simplifies volumeClaimTemplate for Alertmanager/Prometheus to sizeInGiB and storageClassName with DNS subdomain validation and minProperties: 1.
Tests
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
Adds many new positive and negative validation test cases exercising Prometheus-related configuration paths (PrometheusK8sConfig via PrometheusConfig), covering nodeSelector, resources, tolerations, topologySpreadConstraints, additionalAlertmanagerConfigs, remoteWrite, externalLabels, and limit/invalid-value rejections.
Generated Deepcopy
config/v1alpha1/zz_generated.deepcopy.go
Adds DeepCopyInto/DeepCopy implementations for new types (e.g., AdditionalAlertmanagerConfig, AuthorizationConfig, PrometheusConfig, RelabelConfig, RemoteWriteSpec, Retention, SecretKeySelector, TLSConfig, VolumeClaimConfig, Label) and updates ClusterMonitoringSpec.DeepCopyInto to include PrometheusConfig.
Generated Swagger docs
config/v1alpha1/zz_generated.swagger_doc_generated.go
Adds SwaggerDoc maps and SwaggerDoc() methods for new types and documents prometheusConfig (and associated fields) in ClusterMonitoringSpec.
Generated OpenAPI
openapi/generated_openapi/zz_generated.openapi.go
Adds OpenAPI definitions/builders and registers new v1alpha1 schemas for the new types (e.g., PrometheusConfig, AdditionalAlertmanagerConfig, RemoteWriteSpec, RelabelConfig, TLSConfig, VolumeClaimConfig, SecretKeySelector, Retention, Label, PublicKey, etc.) and integrates them into GetOpenAPIDefinitions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay special attention to:
    • config/v1alpha1/types_cluster_monitoring.go — new types, validation tags, VolumeClaimConfig semantics and JSON tags.
    • CRD manifest variants (zz_generated*.crd.yaml, payload-manifests/crds/*) — schema consistency, x-kubernetes-validations (storageClassName DNS rules, min/max constraints), and required property changes.
    • config/v1alpha1/tests/...ClusterMonitoringConfig.yaml — ensure test expectations align with CRD validations and error messages.
    • Generated artifacts (zz_generated.deepcopy.go, zz_generated.swagger_doc_generated.go, openapi/generated_openapi/zz_generated.openapi.go) — confirm generation consistency and that references/refs match the new types.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented
The command is terminated due to an error: build linters: unable to load custom analyzer "kubeapilinter": tools/_output/bin/kube-api-linter.so, plugin: not implemented


Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 17, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2)

453-485: Volume storage schema matches feature‑gated CRD

The Alertmanager volumeClaimTemplate schema here matches the feature‑gated CRD (sizeInGiB + storageClassName, with sane bounds and DNS validation). Keeping these in sync via generation is good for maintainability.


895-1876: TechPreview prometheusConfig matches feature‑gated schema

This prometheusConfig definition mirrors the feature‑gated manifest: same enums, limits, CEL rules, and nested structures. That consistency across variants is important for tooling and user documentation; no additional issues spotted.

🧹 Nitpick comments (2)
config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

354-776: Add negative tests for relabel targetLabel requirement and staticConfigs validation

The new prometheusConfig tests cover most structural and cardinality constraints well. Once you tighten the CRD validation for writeRelabelConfigs.targetLabel, it would be worthwhile to add a negative case where action: Replace (or HashMod) is set without targetLabel and assert it is rejected. Similarly, a test that feeds a malformed staticConfigs entry (e.g. missing :<port>) would guard the host:port CEL rule from regressions.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1063-1198: TLS config allows potentially incomplete mTLS configuration.

The tlsConfig section has minProperties: 1 but doesn't enforce that cert and key are provided together for mutual TLS. A user could specify cert without key or vice versa, which would result in an incomplete mTLS configuration. Consider adding a validation rule to ensure both are present or both are absent.

Example CEL rule to add:

x-kubernetes-validations:
- message: cert and key must both be specified for mutual TLS, or both omitted
  rule: '(has(self.cert) && has(self.key)) || (!has(self.cert) && !has(self.key))'
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between bfa868a and 43ae844.

📒 Files selected for processing (12)
  • config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
  • config/v1alpha1/types_cluster_monitoring.go (4 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2 hunks)
  • config/v1alpha1/zz_generated.deepcopy.go (10 hunks)
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (2 hunks)
  • config/v1alpha1/zz_generated.swagger_doc_generated.go (7 hunks)
  • openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.swagger_doc_generated.go
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • config/v1alpha1/types_cluster_monitoring.go
  • config/v1alpha1/zz_generated.deepcopy.go
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • openapi/generated_openapi/zz_generated.openapi.go
🧬 Code graph analysis (1)
config/v1alpha1/zz_generated.swagger_doc_generated.go (1)
config/v1alpha1/types_cluster_monitoring.go (10)
  • AdditionalAlertmanagerConfig (610-671)
  • AuthorizationConfig (857-870)
  • Label (674-688)
  • PrometheusConfig (434-597)
  • RelabelConfig (731-797)
  • RemoteWriteSpec (692-727)
  • Retention (916-936)
  • SecretKeySelector (874-891)
  • TLSConfig (802-830)
  • VolumeClaimConfig (895-912)
🔇 Additional comments (39)
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)

453-485: Alertmanager PVC schema simplification looks good

The new volumeClaimTemplate schema (required sizeInGiB with sane bounds + optional storageClassName with DNS-1123 validation) is clear, constrained, and much easier to reason about than a full PVC spec. No issues from a correctness or maintainability standpoint.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2)

453-485: Alertmanager PVC schema change is consistent and maintainable

The simplified volumeClaimTemplate for Alertmanager here matches the CustomNoUpgrade CRD and keeps the schema tight (bounded sizeInGiB, validated storageClassName). This looks correct and easier to support across payload variants.


895-1876: The prometheusConfig block does not duplicate an earlier schema. The writeRelabelConfigs section appears exactly once in this configuration, properly nested within remoteWrite (a property of prometheusConfig). The staticConfigs are in additionalAlertmanagerConfigs, a separate configuration section. The CEL validation rule self.action in [''Replace'', ''HashMod''] ? self.targetLabel != '''' : true correctly enforces the documented requirement that targetLabel is mandatory for Replace and HashMod actions. No consistency issues exist between TechPreviewNoUpgrade and CustomNoUpgrade variants for these fields.

Likely an incorrect or invalid review comment.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2)

453-485: LGTM: volumeClaimTemplate API simplification improves usability.

The simplified volumeClaimTemplate structure (sizeInGiB + storageClassName) is clearer and more maintainable than exposing the full PVC spec. The required sizeInGiB field ensures explicit storage allocation, while optional storageClassName allows platform defaults. Validation is comprehensive with proper bounds and DNS subdomain checking.


895-1876: LGTM: Comprehensive prometheusConfig API with thorough validation.

The new prometheusConfig field provides extensive configuration surface for platform Prometheus with well-designed validation:

  • Proper security defaults (certificateVerification: "Verify", documented HTTP/HTTPS scheme choice)
  • Reasonable bounds on all numeric constraints (timeouts, storage limits, array sizes)
  • Comprehensive CEL validation rules ensuring API correctness
  • List-map configurations supporting server-side apply workflows
  • Clear documentation throughout the nested structure

The API design enables the ConfigMap-to-CRD migration mentioned in the PR description while maintaining security and usability.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1-1909: LGTM: DevPreviewNoUpgrade CRD variant is consistent with CustomNoUpgrade.

This CRD manifest correctly mirrors the CustomNoUpgrade variant (first file) with only the expected difference in feature-set annotation (line 10). The volumeClaimTemplate simplification and prometheusConfig additions are identical, ensuring consistency across release channels. The zz_generated file prefix correctly indicates auto-generation from Go types.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (4)

453-485: LGTM on volumeClaimTemplate structure.

The simplified volumeClaimTemplate schema with sizeInGiB (required, 1-16384 range) and storageClassName (optional, DNS subdomain validated) is well-designed. The minProperties: 1 constraint combined with required: [sizeInGiB] ensures valid configurations.


895-908: Well-structured prometheusConfig section.

The top-level prometheusConfig field is comprehensive and follows the same patterns as other config sections in this CRD. The minProperties: 1 constraint ensures the field isn't set to an empty object, which is appropriate.


1843-1875: LGTM on Prometheus volumeClaimTemplate.

The volumeClaimTemplate for Prometheus is consistent with the Alertmanager volumeClaimTemplate structure, ensuring API consistency across the CRD.


1476-1480: No action required—targetLabel validation rule is correct.

The CEL validation rule correctly enforces that targetLabel is required for Replace and HashMod actions. When targetLabel is omitted (as an optional field), it evaluates to an empty string in CEL, causing self.targetLabel != '' to evaluate to false and reject the validation for Replace/HashMod actions. For other actions, the rule correctly permits omitting targetLabel. The minLength: 1 constraint ensures non-empty values when the field is present.

Likely an incorrect or invalid review comment.

config/v1alpha1/zz_generated.swagger_doc_generated.go (3)

121-134: LGTM on AdditionalAlertmanagerConfig Swagger documentation.

The Swagger documentation accurately describes all fields of the AdditionalAlertmanagerConfig type, including name, authorization, pathPrefix, scheme, staticConfigs, timeoutSeconds, and tlsConfig.


256-275: LGTM on PrometheusConfig Swagger documentation.

The PrometheusConfig SwaggerDoc map comprehensively documents all configuration options including additionalAlertmanagerConfigs, enforcedBodySizeLimitBytes, externalLabels, logLevel, nodeSelector, queryLogFile, remoteWrite, resources, retention, tolerations, topologySpreadConstraints, collectionProfile, and volumeClaimTemplate.


346-354: LGTM on VolumeClaimConfig Swagger documentation.

The VolumeClaimConfig documentation correctly describes the storageClassName and sizeInGiB fields with their constraints and default behaviors.

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (2)

453-485: Alertmanager volume storage schema looks consistent and constrained

The simplified volumeClaimTemplate (sizeInGiB + storageClassName with DNS validation and minProperties/required) aligns with the new Prometheus storage shape and keeps user input tightly bounded. No issues from a schema or UX perspective.


895-1876: PrometheusConfig surface and validations look well‑designed

The new prometheusConfig block provides a rich but bounded configuration surface (limits on list sizes, strong enums, URL and path validation, DNS validation for names, etc.), and follows existing patterns used elsewhere in this CRD. I don’t see correctness, security, or maintainability problems in this schema addition.

config/v1alpha1/zz_generated.deepcopy.go (3)

55-91: VolumeClaimTemplate deepcopy usage assumes value types (looks correct)

Both AlertmanagerCustomConfig.DeepCopyInto and PrometheusConfig.DeepCopyInto call in.VolumeClaimTemplate.DeepCopyInto(&out.VolumeClaimTemplate) without a nil check. This is appropriate as long as VolumeClaimTemplate/VolumeClaimConfig are value fields (not pointers), which matches the generated VolumeClaimConfig.DeepCopyInto and the simplified CRD schema. I don’t see a risk of nil dereference given that setup.

Also applies to: 982-1035


14-35: Deepcopy semantics for new helper types are safe and non‑aliasing

For AdditionalAlertmanagerConfig, AuthorizationConfig, Label, RelabelConfig, RemoteWriteSpec, Retention, SecretKeySelector, TLSConfig, and VolumeClaimConfig, the generated DeepCopy* methods correctly:

  • Deep‑copy slices (e.g., StaticConfigs, SourceLabels, WriteRelabelConfigs).
  • Allocate new pointers where needed (e.g., Replacement, StorageClassName).
  • Use value copy for structs that only contain value fields.

This avoids shared mutable state between copies and aligns with Kubernetes codegen patterns.

Also applies to: 120-135, 722-736, 1073-1120, 1099-1120, 1122-1136, 1196-1210, 1233-1250, 1268-1287


399-405: ClusterMonitoringSpec now deep‑copies PrometheusConfig correctly

ClusterMonitoringSpec.DeepCopyInto now invokes in.PrometheusConfig.DeepCopyInto(&out.PrometheusConfig), and PrometheusConfig.DeepCopyInto recursively handles all nested slices, maps, and structs (alertmanager configs, externalLabels, nodeSelector, remoteWrite, resources, retention, tolerations, topologySpreadConstraints, volumeClaimTemplate). This ensures controllers manipulating the spec won’t accidentally share internal state across copies.

Also applies to: 982-1035

openapi/generated_openapi/zz_generated.openapi.go (13)

424-477: LGTM - Type mappings are correctly registered.

The new v1alpha1 types are properly added to the OpenAPI definitions map in alphabetical order, with each type having a corresponding schema builder function.


21229-21305: LGTM - AdditionalAlertmanagerConfig schema is well-defined.

The schema correctly requires name and staticConfigs, uses appropriate list type annotations, and properly declares dependencies on AuthorizationConfig and TLSConfig.


21451-21463: LGTM - volumeClaimTemplate correctly refactored to use shared type.

The change to reference VolumeClaimConfig instead of an inline definition improves type reuse consistency across AlertmanagerConfig and PrometheusConfig.


21487-21527: LGTM - AuthorizationConfig uses proper discriminated union pattern.

The schema correctly implements a discriminated union with type as the discriminator, and securely references secrets via SecretKeySelector rather than inline values.


21964-21982: LGTM - prometheusConfig field properly integrated into ClusterMonitoringSpec.

The field is appropriately optional, well-documented, and dependencies are correctly declared.


22470-22496: LGTM - Label schema is clean and minimal.


22938-23139: LGTM - PrometheusConfig provides comprehensive and well-structured configuration surface.

Key observations:

  • All fields are optional, enabling incremental adoption
  • List types use appropriate x-kubernetes-list-type annotations (map for uniqueness by key, atomic for tolerations)
  • Map keys are correctly specified for server-side apply merging
  • Dependencies are complete and accurate

23169-23243: LGTM - RelabelConfig schema correctly models Prometheus relabeling rules.

The required fields (name, action) make sense for identifying and executing relabel rules, while source/target configuration remains optional based on action type.


23245-23302: LGTM - RemoteWriteSpec provides clean remote write configuration.

The url is correctly required, and writeRelabelConfigs properly uses map-type list with name as the key.


23353-23399: LGTM - RetentionPolicy correctly implements discriminated union.

The schema properly uses x-kubernetes-unions to discriminate between RetentionNumber and RetentionSize configurations.


23425-23456: LGTM - SecretKeySelector is correctly designed.

The atomic map type is appropriate for immutable secret references, and the namespace scoping to openshift-monitoring is clearly documented.


23488-23536: LGTM - TLSConfig schema is well-designed.

The schema correctly references secrets via SecretKeySelector for all certificate/key fields. The certificateVerification field appropriately documents the security implications of SkipVerify. The "at least one option must be specified" constraint is likely enforced via CEL validation in the source type definition.


23561-23587: LGTM - VolumeClaimConfig is clean and practical.

Requiring sizeInGiB while making storageClassName optional allows users to rely on the cluster's default StorageClass, which is a good default experience.

config/v1alpha1/types_cluster_monitoring.go (8)

92-104: LGTM!

The prometheusConfig field is well-documented with clear explanations of what it configures and follows the established pattern of other fields in ClusterMonitoringSpec.


260-265: LGTM!

The change to VolumeClaimConfig simplifies the API by exposing only the necessary storage configuration fields rather than the full PersistentVolumeClaim spec. This is appropriate for a v1alpha1 API.


430-597: Well-structured configuration type.

The PrometheusConfig struct provides comprehensive configuration options with appropriate validation constraints, good documentation, and proper list type annotations for server-side apply support.


762-769: Regex syntax is not validated at admission time.

The regex field documentation states it must be a valid RE2 regular expression, but there's no CEL validation to enforce this. Invalid regex patterns will only fail at runtime when Prometheus attempts to compile them, potentially causing configuration reload failures.

CEL doesn't provide a built-in function to validate regex syntax. Consider either:

  1. Documenting clearly that regex syntax errors will be caught at runtime by the operator
  2. Adding operator-level admission validation to test-compile the regex before accepting the CR

If this is intentionally deferred to runtime validation by CMO, please confirm this is the expected behavior.


607-671: LGTM!

The AdditionalAlertmanagerConfig struct is well-designed with appropriate validation. The clever use of URL parsing in the staticConfigs validation (prepending http:// to validate host:port format) is a good approach.


893-936: LGTM!

Both VolumeClaimConfig and Retention structs have sensible validation bounds and clear documentation. The MinProperties=1 constraint ensures meaningful configuration when these types are used.


854-870: LGTM!

The AuthorizationConfig correctly implements the Kubernetes union pattern with a discriminator field and appropriate XValidation to enforce the union semantics.


673-688: LGTM!

The supporting types (Label, RemoteWriteSpec, SecretKeySelector, RelabelAction, CollectionProfile) are well-defined with appropriate validation constraints, clear documentation, and proper enum handling.

Also applies to: 690-727, 872-891, 938-969

@danielmellado danielmellado force-pushed the prometheus_k8s_config_api branch from 43ae844 to da18977 Compare December 17, 2025 11:57
@danielmellado
Copy link
Contributor Author

The verify-crd-schema and verify-crdify jobs are failing because we intentionally changed the volumeClaimTemplate field from the full corev1.PersistentVolumeClaim type to a simpler VolumeClaimConfig struct with just sizeInGiB and storageClassName.

This was done based on reviewer feedback to provide better admission-time validation and avoid exposing ~40 PVC fields that don't make sense for this use case.

Since this is a TechPreview API (v1alpha1), breaking changes are allowed, and both tools explicitly say it's appropriate to override when you're making intentional changes rather than introducing regressions.

@everettraven do you have any another proposal for this? Besides that, my plan is to split this into smaller changes to make it easier to work/review if that's ok for you.

cc @marioferh

This commit introduces a new API to be introduced as a part of the
migration in CMO from ConfigMap to CRDs

Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Signed-off-by: Daniel Mellado <dmellado@fedoraproject.org>
@danielmellado danielmellado force-pushed the prometheus_k8s_config_api branch from da18977 to b001760 Compare December 17, 2025 13:36
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
config/v1alpha1/types_cluster_monitoring.go (1)

610-621: Consider using format validation for consistency.

The name field uses a manual regex pattern for DNS subdomain validation. For consistency with other fields in this file (e.g., SecretKeySelector.Name at line 883, TLSConfig.ServerName at line 824), consider using the format.dns1123Subdomain() validation approach:

-// +kubebuilder:validation:XValidation:rule="self.matches('^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$')",message="must be a valid DNS subdomain: lowercase alphanumeric, hyphens, or periods"
+// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="must be a valid DNS subdomain: lowercase alphanumeric, hyphens, or periods"

This leverages the built-in Kubernetes format validator and ensures consistency across the API.

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

895-1882: LGTM! Comprehensive Prometheus configuration with strong validations.

The new prometheusConfig field provides extensive configuration options with appropriate validation:

✅ Security considerations are well-handled:

  • Secrets referenced by name/key (not inline)
  • TLS with proper cert/key mutual validation (lines 1201-1204)
  • Authorization with type-conditional bearerToken validation (lines 983-986)
  • Secure defaults (certificateVerification: Verify)

✅ Cross-field validations are comprehensive:

  • targetLabel required for Replace/HashMod actions (lines 1483-1486)
  • Mutual TLS cert/key must be specified together (lines 1201-1204)

✅ Ranges and constraints are reasonable:

  • Storage: 1 GiB - 16 TiB
  • Retention: 1-365 days
  • Timeouts: 1-600 seconds
  • Body size limit: 1 byte - 1 TB

Optional: Consider stricter host:port validation.

Lines 1042-1049 validate staticConfigs using isURL('http://' + self) to parse host:port. While functional, this accepts entries like "host:port/path" or "host:port?query" since URL parsing includes path/query components. The description states only "host:port" format is expected.

However, Prometheus will reject invalid endpoints at runtime, so this is low-severity validation drift rather than a security or correctness issue.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

453-485: Consider consolidating duplicated volumeClaimTemplate schema.

The volumeClaimTemplate schema is duplicated in both alertmanagerConfig.customConfig (lines 453-485) and prometheusConfig (lines 1849-1881) with identical structure and validation rules. This duplication creates maintenance risk—future updates might inadvertently change only one instance, causing API surface inconsistency.

Since this is generated code, verify that the Go type definition properly reuses VolumeClaimConfig for both fields. If the type is already shared in types_cluster_monitoring.go, the duplication here is unavoidable but acceptable. Otherwise, refactor the Go types to share a common struct.

Also applies to: 1849-1881

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between da18977 and b001760.

📒 Files selected for processing (12)
  • config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1 hunks)
  • config/v1alpha1/types_cluster_monitoring.go (4 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2 hunks)
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2 hunks)
  • config/v1alpha1/zz_generated.deepcopy.go (10 hunks)
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (2 hunks)
  • config/v1alpha1/zz_generated.swagger_doc_generated.go (7 hunks)
  • openapi/generated_openapi/zz_generated.openapi.go (13 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2 hunks)
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • openapi/generated_openapi/zz_generated.openapi.go
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.swagger_doc_generated.go
  • config/v1alpha1/zz_generated.deepcopy.go
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml
  • payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml
  • config/v1alpha1/types_cluster_monitoring.go
🧬 Code graph analysis (1)
config/v1alpha1/zz_generated.deepcopy.go (1)
config/v1alpha1/types_cluster_monitoring.go (11)
  • AdditionalAlertmanagerConfig (610-671)
  • TLSConfig (804-832)
  • AuthorizationConfig (859-872)
  • PrometheusConfig (434-597)
  • Label (674-688)
  • RemoteWriteSpec (692-727)
  • ContainerResource (305-335)
  • Retention (918-938)
  • RelabelConfig (731-797)
  • SecretKeySelector (876-893)
  • VolumeClaimConfig (897-914)
🔇 Additional comments (29)
config/v1alpha1/types_cluster_monitoring.go (10)

92-104: LGTM!

The prometheusConfig field is well-documented with clear scope (platform Prometheus only) and appropriate optional semantics.


260-265: LGTM!

The simplified VolumeClaimConfig type provides better API ergonomics by exposing only the essential storage configuration fields (size and storage class) rather than the full PVC specification.


430-597: LGTM!

The PrometheusConfig type is well-designed with:

  • Comprehensive configuration options for platform Prometheus
  • Appropriate validation constraints (min/max items, value ranges)
  • Clear documentation for each field
  • Proper use of list types (map, atomic, set) for different scenarios
  • MinProperties validation to prevent empty objects

691-727: LGTM!

The RemoteWriteSpec type is well-designed with appropriate validations for URLs, timeouts, and relabel configurations.


729-797: LGTM!

The RelabelConfig type demonstrates excellent API design with:

  • Struct-level validation enforcing action-specific requirements
  • Pointer type for replacement allowing distinction between omitted and empty values
  • Comprehensive support for Prometheus relabeling actions
  • Appropriate field length limits for practical use cases

799-832: LGTM!

The TLSConfig type is well-designed with appropriate mTLS validation (cert and key must be paired) and secure defaults (Verify for certificate verification).


856-872: LGTM!

The AuthorizationConfig properly implements a discriminated union with validation ensuring bearerToken is provided when type is BearerToken.


874-893: LGTM!

The SecretKeySelector type is well-designed with appropriate validations and the correct structType=atomic marker for ensuring it's treated as an indivisible unit.


895-914: LGTM!

The VolumeClaimConfig type provides a simplified, focused API for storage configuration, exposing only the essential fields (size and storage class) needed for this use case.


916-938: LGTM!

The Retention type is well-designed with reasonable limits (up to 365 days and 16 TiB) and proper validation ensuring at least one retention policy is specified.

config/v1alpha1/tests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (2)

354-565: LGTM!

The positive test cases provide good coverage of valid PrometheusConfig scenarios, including minimal configuration, scheduling options (nodeSelector, tolerations, topologySpreadConstraints), resource allocation, and external integrations (additionalAlertmanagerConfigs, remoteWrite, externalLabels).


566-776: LGTM!

The negative test cases comprehensively verify validation constraints, including:

  • Empty object rejection (minProperties validation)
  • Maximum item limits across all list fields
  • Minimum item requirements for arrays
  • Error messages properly formatted to match OpenAPI validation output
payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (2)

453-485: LGTM!

The volumeClaimTemplate schema correctly reflects the simplified VolumeClaimConfig structure with appropriate validations for size and storage class name.


895-1882: LGTM!

The prometheusConfig schema correctly represents the comprehensive PrometheusConfig API with:

  • Proper validation constraints (min/max items, value ranges)
  • Correct list types (map, atomic, set) for different fields
  • Complex validations (TLS cert/key pairing, relabel action requirements)
  • Consistent with the Go type definitions
config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-CustomNoUpgrade.crd.yaml (1)

1385-1486: CEL validation for writeRelabelConfigs.targetLabel correctly enforces presence across all CRD variants

The CEL rule now uses has(self.targetLabel) to ensure the target label field exists when action is Replace or HashMod, preventing configs from bypassing validation with null values. This stricter validation is consistently applied across all three generated ClusterMonitoring CRD variants (CustomNoUpgrade, DevPreviewNoUpgrade, TechPreviewNoUpgrade).

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

453-485: Alertmanager volumeClaimTemplate schema is properly defined

The new simplified volumeClaimTemplate schema looks correct: sizeInGiB is a required int32 with proper bounds (1–16384 GiB), and storageClassName is optional with DNS1123 subdomain validation. The kubebuilder validation tags in the Go type definition align with the CRD schema.

Note: Controller updates needed for this schema change will be in the config-operator repository, not in this API definitions repo.

config/v1alpha1/zz_generated.featuregated-crd-manifests/clustermonitorings.config.openshift.io/ClusterMonitoringConfig.yaml (1)

453-485: LGTM! Simplified storage configuration is appropriate for v1alpha1.

The refactoring of volumeClaimTemplate from a full PersistentVolumeClaim spec to a focused sizeInGiB + storageClassName configuration is well-implemented:

  • Proper validation constraints (1-16384 GiB range, DNS subdomain validation)
  • Required field enforcement via required: [sizeInGiB]
  • Clear admission-time validation

This breaking change is acceptable for v1alpha1 (compatibility level 4) and aligns with the PR's goal of providing better validation while avoiding exposing ~40 PVC fields not relevant to this use case.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (1)

1-1915: Schema changes are identical to ClusterMonitoringConfig.yaml.

This CRD manifest (DevPreviewNoUpgrade feature set) contains the same schema changes as reviewed in ClusterMonitoringConfig.yaml:

  1. Simplified volumeClaimTemplate (lines 453-485)
  2. New prometheusConfig field (lines 895-1882)

All validation rules, security considerations, and implementation details are identical. The only difference is the feature gate annotation (line 10: DevPreviewNoUpgrade vs ClusterMonitoringConfig).

Refer to the review comments on the first file for detailed analysis.

config/v1alpha1/zz_generated.crd-manifests/0000_10_config-operator_01_clustermonitorings-TechPreviewNoUpgrade.crd.yaml (1)

453-485: LGTM! Simplified volumeClaimTemplate schema is appropriate for TechPreview.

The refactored schema reduces the API surface from ~40 PVC fields to just the essential sizeInGiB and storageClassName. This aligns with the PR objectives to provide better admission-time validation while keeping the v1alpha1 API focused and maintainable.

config/v1alpha1/zz_generated.swagger_doc_generated.go (3)

121-134: LGTM! Comprehensive Swagger documentation for AdditionalAlertmanagerConfig.

The documentation clearly describes each field's purpose, constraints, and provides helpful examples (e.g., endpoint formats for staticConfigs). This will aid API consumers in understanding the configuration options.


256-275: LGTM! Well-structured Swagger documentation for PrometheusConfig.

The documentation effectively covers all PrometheusConfig fields with clear explanations of purpose, constraints, and use cases. The descriptions for complex fields like additionalAlertmanagerConfigs and remoteWrite provide helpful context for API consumers.


346-354: LGTM! Swagger documentation accurately reflects the simplified VolumeClaimConfig.

The documentation clearly describes the two fields (storageClassName and sizeInGiB) with appropriate constraints and validation details, matching the simplified API surface introduced in this PR.

config/v1alpha1/zz_generated.deepcopy.go (3)

983-1035: LGTM! PrometheusConfig.DeepCopyInto correctly handles all nested structures.

The implementation properly deep-copies all slice and map fields, and correctly handles nested structs:

  • Slice fields (AdditionalAlertmanagerConfigs, RemoteWrite, Resources, etc.) are deep-copied with iteration
  • Map field (NodeSelector) is deep-copied
  • Simple value types (Retention) use shallow copy, which is safe
  • VolumeClaimTemplate properly calls DeepCopyInto

1269-1277: LGTM! VolumeClaimConfig.DeepCopyInto correctly handles pointer field.

The implementation properly handles the optional StorageClassName pointer field:

  • Line 1270 performs shallow copy of the struct
  • Lines 1271-1275 conditionally deep-copy the pointer field when non-nil

This follows the correct pattern for deep-copying structs with optional pointer fields.


14-35: Verify TLSConfig structure for pointer/slice fields.

Lines 17 and 23 perform shallow copies of Authorization and TLSConfig. We confirmed Authorization (DelegatedAuthorization) contains only a Disabled bool field, making the shallow copy safe. However, the TLSConfig struct definition in the openshift/api config/v1alpha1 package could not be definitively located to verify whether it contains pointer or slice fields that require deep copying.

Additionally, these assignments are redundant—line 15's *out = *in already copies all fields shallowly. If deep copies are needed for nested types, the pattern should be refactored to call DeepCopyInto on those fields explicitly.

payload-manifests/crds/0000_10_config-operator_01_clustermonitorings-DevPreviewNoUpgrade.crd.yaml (2)

1345-1494: Verify authentication support for remote write endpoints.

The remoteWrite configuration appears to lack explicit authentication fields (e.g., basic auth, bearer token, OAuth2, or TLS client certificates). Sending metrics to remote endpoints without authentication could expose sensitive monitoring data.

Please verify whether:

  1. Authentication for remote write is handled through a separate mechanism (e.g., service accounts, network policies)
  2. This is a deliberate simplification for the v1alpha1 API with plans to add authentication in future versions
  3. Users are expected to use authenticated endpoints that don't require explicit credentials in the config

If authentication is not supported, consider this a security gap that should be addressed before promoting beyond TechPreview.


1849-1881: Good: Consistent volume configuration across components.

The volumeClaimTemplate structure for Prometheus matches the simplified design used in alertmanagerConfig (lines 453-485), ensuring API consistency. The validation rules (1-16384 GiB, DNS subdomain for storage class) are appropriate.

openapi/generated_openapi/zz_generated.openapi.go (2)

21451-21462: LGTM: VolumeClaimConfig simplification.

The change from corev1.PersistentVolumeClaim to VolumeClaimConfig is intentional per the PR description and provides better admission-time validation by exposing only the relevant fields (sizeInGiB and storageClassName) rather than ~40 PVC fields.


23523-23529: The generated OpenAPI schema is missing the Enum constraint for certificateVerification.

The source Go code (config/v1alpha1/types_cluster_monitoring.go) defines CertificateVerificationType with +kubebuilder:validation:Enum=Verify;SkipVerify, but the generated OpenAPI schema at line 23523-23529 omits this constraint. The generated schema should include Enum: []interface{}{"Verify", "SkipVerify"} to reflect the allowed values, ensuring API clients have accurate schema validation information.

Comment on lines +21257 to +21263
"scheme": {
SchemaProps: spec.SchemaProps{
Description: "scheme defines the URL scheme to use when communicating with Alertmanager instances. Possible values are `HTTP` or `HTTPS`. When omitted, defaults to `HTTP`.",
Type: []string{"string"},
Format: "",
},
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for AdditionalAlertmanagerConfig struct definition in source files
find . -type f -name "*.go" -path "*/v1alpha1/*" | head -20

Repository: openshift/api

Length of output: 866


🏁 Script executed:

#!/bin/bash
# Search for AdditionalAlertmanagerConfig struct with Scheme field
rg -A 20 "type AdditionalAlertmanagerConfig struct" --type=go

Repository: openshift/api

Length of output: 2555


🏁 Script executed:

#!/bin/bash
# Search for Scheme field with validation markers
rg -B 2 -A 2 "Scheme.*string" --type=go | grep -A 2 -B 2 "kubebuilder"

Repository: openshift/api

Length of output: 39


🏁 Script executed:

#!/bin/bash
# Check for RetentionPolicy to compare Enum implementation
rg -A 20 "type RetentionPolicy struct" --type=go

Repository: openshift/api

Length of output: 1831


🏁 Script executed:

#!/bin/bash
# Get the full AdditionalAlertmanagerConfig struct definition
rg -A 50 "type AdditionalAlertmanagerConfig struct" config/v1alpha1/types_cluster_monitoring.go

Repository: openshift/api

Length of output: 3679


🏁 Script executed:

#!/bin/bash
# Search for Scheme field in the monitoring config file specifically
rg -B 2 -A 2 "Scheme\s" config/v1alpha1/types_cluster_monitoring.go

Repository: openshift/api

Length of output: 487


Regenerate OpenAPI schema—source has Enum constraint that generated schema lacks.

The source struct in config/v1alpha1/types_cluster_monitoring.go properly defines +kubebuilder:validation:Enum=HTTP;HTTPS for the Scheme field, but the generated schema at lines 21257-21263 omits the Enum constraint in the output. The generated schema shows only Type, Format, and Description, missing the Enum field that should enumerate the valid values (HTTP, HTTPS). Regenerate the OpenAPI schema file to sync with the source validation markers.

Comment on lines +21494 to +21500
"type": {
SchemaProps: spec.SchemaProps{
Description: "type specifies the authentication type to use. Valid values are \"None\" (no authentication) and \"BearerToken\" (bearer token authentication). When set to None, no authentication credentials are sent. When set to BearerToken, the bearerToken field must be specified.",
Type: []string{"string"},
Format: "",
},
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find AuthorizationConfig struct definition
find . -type f -name "*.go" -not -path "*/vendor/*" -not -path "*/zz_generated*" | xargs grep -l "type AuthorizationConfig struct" | head -5

Repository: openshift/api

Length of output: 102


🏁 Script executed:

#!/bin/bash
# Search for AuthorizationConfig and its Type field with validation markers
rg "type AuthorizationConfig struct" -A 20 --type=go | head -60

Repository: openshift/api

Length of output: 2064


🏁 Script executed:

#!/bin/bash
# Check the generated schema around the type field for Enum constraint
sed -n '21490,21510p' openapi/generated_openapi/zz_generated.openapi.go

Repository: openshift/api

Length of output: 1163


🏁 Script executed:

#!/bin/bash
# Check what AuthorizationType is defined as
rg "type AuthorizationType" -A 10 --type=go

Repository: openshift/api

Length of output: 1966


🏁 Script executed:

#!/bin/bash
# Check if there are any validation tags or const definitions for AuthorizationType
rg "AuthorizationType" --type=go -B 2 -A 2 | grep -E "(const|Enum|None|BearerToken)" | head -20

Repository: openshift/api

Length of output: 1101


The Enum constraint is missing from the generated schema despite validation existing in the source.

The source code defines AuthorizationType with +kubebuilder:validation:Enum=None;BearerToken, and the type field in AuthorizationConfig struct uses this type. However, the generated OpenAPI schema (lines 21494-21500) omits the Enum constraint. The schema description documents the valid values but provides no machine-readable Enum validation. Add the validation marker directly to the struct field to ensure it propagates to the generated schema:

Type AuthorizationType `json:"type,omitempty" kubebuilder:validation:Enum=None;BearerToken`

Comment on lines +22996 to 23002
"logLevel": {
SchemaProps: spec.SchemaProps{
Description: "retentionType sets the type of retention policy. Currently, the only valid policies are retention by number of backups (RetentionNumber), by the size of backups (RetentionSize). More policies or types may be added in the future. Empty string means no opinion and the platform is left to choose a reasonable default which is subject to change without notice. The current default is RetentionNumber with 15 backups kept.\n\nPossible enum values:\n - `\"RetentionNumber\"` sets the retention policy based on the number of backup files saved\n - `\"RetentionSize\"` sets the retention policy based on the total size of the backup files saved",
Default: "",
Description: "logLevel defines the verbosity of logs emitted by Prometheus. This field allows users to control the amount and severity of logs generated, which can be useful for debugging issues or reducing noise in production environments. Allowed values are Error, Warn, Info, and Debug. When set to Error, only errors will be logged. When set to Warn, both warnings and errors will be logged. When set to Info, general information, warnings, and errors will all be logged. When set to Debug, detailed debugging information will be logged. When omitted, this means no opinion and the platform is left to choose a reasonable default, that is subject to change over time. The current default value is `Info`.",
Type: []string{"string"},
Format: "",
Enum: []interface{}{"RetentionNumber", "RetentionSize"},
},
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for PrometheusConfig struct definition and LogLevel field

# First, find files containing PrometheusConfig
fd -type f -name "*.go" -path "*/v1alpha1/*" | xargs grep -l "type PrometheusConfig struct" 2>/dev/null | head -5

# Also search more broadly
fd -type f -name "*prometheus*.go" -o -name "*monitoring*.go" | head -20

Repository: openshift/api

Length of output: 398


🏁 Script executed:

#!/bin/bash
# Once we locate the files, check for LogLevel field with validation tags
rg "type PrometheusConfig struct" -A 100 --type=go | head -150

Repository: openshift/api

Length of output: 10780


🏁 Script executed:

#!/bin/bash
# Search for any kubebuilder validation markers on logLevel
rg -B2 -A2 "LogLevel\s" --type=go -g "*/v1alpha1/*" | head -50

Repository: openshift/api

Length of output: 4136


🏁 Script executed:

#!/bin/bash
# Search for collectionProfile in PrometheusConfig and check for Enum validation
rg -B5 -A5 "collectionProfile" config/v1alpha1/types_cluster_monitoring.go

# Also search for any CollectionProfile type definition
rg "type CollectionProfile" -A5 config/v1alpha1/types_cluster_monitoring.go

Repository: openshift/api

Length of output: 1546


🏁 Script executed:

#!/bin/bash
# Get the complete CollectionProfile type definition with any validation markers
rg "^type CollectionProfile" -A15 config/v1alpha1/types_cluster_monitoring.go

Repository: openshift/api

Length of output: 695


🏁 Script executed:

#!/bin/bash
# Check the OpenAPI generated file for collectionProfile field around the mentioned lines
sed -n '23100,23140p' openapi/generated_openapi/zz_generated.openapi.go

Repository: openshift/api

Length of output: 3000


Add Enum constraint to collectionProfile type definition.

The CollectionProfile type (used for the collectionProfile field) lacks the +kubebuilder:validation:Enum=Full;Minimal marker despite documenting allowed values. Add this validation marker to align with similar types like LogLevel and ensure API schema validation.

Note: The logLevel field's OpenAPI schema should also include the Enum constraint (currently missing), though the source type already has +kubebuilder:validation:Enum=Error;Warn;Info;Debug defined—this appears to be an OpenAPI generation issue.

Comment on lines +23119 to +23125
"collectionProfile": {
SchemaProps: spec.SchemaProps{
Description: "collectionProfile defines the metrics collection profile that Prometheus uses to collect metrics from the platform components. Supported values are `Full` or `Minimal`. In the `Full` profile (default), Prometheus collects all metrics that are exposed by the platform components. In the `Minimal` profile, Prometheus only collects metrics necessary for the default platform alerts, recording rules, telemetry and console dashboards. When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time. The default value is `Full`.",
Type: []string{"string"},
Format: "",
},
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Search for CollectionProfile field definition in prometheus-related files
rg -A5 -B2 'CollectionProfile.*string' --type=go -g '**/v1alpha1/*prometheus*.go'

Repository: openshift/api

Length of output: 170


🏁 Script executed:

# Search for CollectionProfile field definition in Go files (excluding generated files)
rg 'CollectionProfile' --type=go -g '!zz_generated*' -A3 -B1

Repository: openshift/api

Length of output: 3111


Add Enum constraint to generated OpenAPI spec.

The source struct in config/v1alpha1/types_cluster_monitoring.go includes // +kubebuilder:validation:Enum=Full;Minimal, which correctly constrains CollectionProfile to "Full" or "Minimal". However, the generated OpenAPI schema is missing the Enum field in SchemaProps (lines 23119-23125). The OpenAPI spec should include Enum: []interface{}{"Full", "Minimal"} to accurately reflect the validation constraints from the source struct.

🤖 Prompt for AI Agents
In openapi/generated_openapi/zz_generated.openapi.go around lines 23119 to
23125, the SchemaProps for "collectionProfile" is missing the Enum constraint
generated from the kubebuilder tag; add Enum: []interface{}{"Full", "Minimal"}
to the SchemaProps for collectionProfile so the OpenAPI reflects the validation,
keeping the Type as "string" and ensuring the enum values are interface{}
entries matching the allowed strings.

Comment on lines +23488 to +23531
func schema_openshift_api_config_v1alpha1_TLSConfig(ref common.ReferenceCallback) common.OpenAPIDefinition {
return common.OpenAPIDefinition{
Schema: spec.Schema{
SchemaProps: spec.SchemaProps{
Description: "TLSConfig represents TLS configuration for Alertmanager connections. At least one TLS configuration option must be specified. For mutual TLS (mTLS), both cert and key must be specified together, or both omitted.",
Type: []string{"object"},
Properties: map[string]spec.Schema{
"ca": {
SchemaProps: spec.SchemaProps{
Description: "ca is an optional CA certificate to use for TLS connections. When omitted, the system's default CA bundle is used.",
Default: map[string]interface{}{},
Ref: ref("github.com/openshift/api/config/v1alpha1.SecretKeySelector"),
},
},
"cert": {
SchemaProps: spec.SchemaProps{
Description: "cert is an optional client certificate to use for mutual TLS connections. When omitted, no client certificate is presented.",
Default: map[string]interface{}{},
Ref: ref("github.com/openshift/api/config/v1alpha1.SecretKeySelector"),
},
},
"key": {
SchemaProps: spec.SchemaProps{
Description: "key is an optional client key to use for mutual TLS connections. When omitted, no client key is used.",
Default: map[string]interface{}{},
Ref: ref("github.com/openshift/api/config/v1alpha1.SecretKeySelector"),
},
},
"serverName": {
SchemaProps: spec.SchemaProps{
Description: "serverName is an optional server name to use for TLS connections. When specified, must be a valid DNS subdomain as per RFC 1123. When omitted, the server name is derived from the URL. Must be between 1 and 253 characters in length.",
Type: []string{"string"},
Format: "",
},
},
"certificateVerification": {
SchemaProps: spec.SchemaProps{
Description: "certificateVerification determines the policy for TLS certificate verification. Allowed values are \"Verify\" (performs certificate verification, secure) and \"SkipVerify\" (skips verification, insecure). When omitted, defaults to \"Verify\" (secure certificate verification is performed).",
Type: []string{"string"},
Format: "",
},
},
},
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check for CEL validation or custom webhook validation for TLSConfig

# First, find the TLSConfig type definition in v1alpha1
fd -e go -g '**/v1alpha1/*' | head -20

# Search for TLSConfig type definition with validation rules
rg -A20 'type TLSConfig struct' --type=go

# Check for CEL/XValidation rules for TLSConfig
rg 'XValidation|CEL' -B2 -A5 --type=go | grep -A10 -B10 -i tlsconfig

# Also search for any "at least one" or field requirement validation
rg 'self.ca|self.cert|self.key' --type=go | head -20

Repository: openshift/api

Length of output: 8774


🏁 Script executed:

#!/bin/bash
# Check the actual generated schema around line 23488-23531
# to see if MinProperties or validation rules are present

# Look at the exact lines mentioned in the review
sed -n '23480,23535p' openapi/generated_openapi/zz_generated.openapi.go

# Also check if there are any min/max properties or validation rules nearby
sed -n '23450,23550p' openapi/generated_openapi/zz_generated.openapi.go | grep -E 'MinProperties|XValidation|MinItems|Required'

# Check how other constraints like MaxLength/MinLength are represented
sed -n '23480,23535p' openapi/generated_openapi/zz_generated.openapi.go | grep -A2 -B2 'serverName'

Repository: openshift/api

Length of output: 3122


MinProperties constraint missing from generated OpenAPI schema.

The source type definition in config/v1alpha1/types_cluster_monitoring.go includes +kubebuilder:validation:MinProperties=1, which enforces the "at least one field" requirement at the CRD level. However, this constraint is not present in the generated OpenAPI schema. While the description correctly documents this requirement, the schema itself lacks the structural representation (MinProperties field) that OpenAPI consumers would expect to see. The validation rules for mTLS cert/key pairing are also missing from the generated schema.

🤖 Prompt for AI Agents
In openapi/generated_openapi/zz_generated.openapi.go around lines 23488 to
23531, the generated schema is missing the MinProperties=1 constraint and the
mTLS cert/key pairing validation; update the SchemaProps for TLSConfig to
include MinProperties: 1, and add OpenAPI validation to express the mutual TLS
pairing (for example, a oneOf/anyOf block that requires both "cert" and "key"
together or forbids one without the other — e.g., one branch with required:
["cert","key"] and another branch that disallows either field being present
alone using not + anyOf(required:["cert"], required:["key"])); ensure these
additions fit the existing spec.Schema structure so the CRD schema enforces “at
least one property” and the cert/key pairing rule.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 17, 2025

@danielmellado: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-scos-images b001760 link true /test okd-scos-images
ci/prow/verify-crd-schema b001760 link true /test verify-crd-schema
ci/prow/verify-crdify b001760 link true /test verify-crdify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment on lines +459 to +460
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=1099511627776
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these reasonable?

When might someone want to reasonably set this to 1 byte? When might they want to set this 1 terabyte?

// Maximum length is 253 characters.
// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:MinLength=1
// +kubebuilder:validation:XValidation:rule="self.matches('^[a-z0-9]([-a-z0-9]*[a-z0-9])?(\\\\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*$')",message="must be a valid DNS subdomain: lowercase alphanumeric, hyphens, or periods"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an existing format utility for DNS 1123 subdomain names.

Example:

// +kubebuilder:validation:XValidation:rule="!format.dns1123Subdomain().validate(self).hasValue()",message="a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character."

// instances.
// Possible values are `HTTP` or `HTTPS`.
// When omitted, defaults to `HTTP`.
// +kubebuilder:validation:Enum=HTTP;HTTPS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use a type alias, we generally prefer that this marker is placed on the type alias.

// +kubebuilder:validation:MaxItems=10
// +listType=map
// +listMapKey=name
WriteRelabelConfigs []RelabelConfig `json:"writeRelabelConfigs,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minimum length of 1 on this if [] has the same semantic meaning as omitted please.

// +kubebuilder:validation:MaxLength=1000
Regex string `json:"regex,omitempty"`
// targetLabel is the target label name where the result is written.
// Required for Replace and HashMod actions (enforced by validation).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to mention that this is enforced by validation

Suggested change
// Required for Replace and HashMod actions (enforced by validation).
// Required for Replace and HashMod actions.

}

// VolumeClaimConfig defines the configuration for a PersistentVolumeClaim used for storage.
// +kubebuilder:validation:MinProperties=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary because you already have one field marked as required.

// Prometheus automatically deletes data older than this duration.
// When omitted, the default is 15 days.
// Minimum value is 1 day.
// Maximum value is 365 days (1 year).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any use cases to retain metrics for longer than a year?

Comment on lines -254 to +265
VolumeClaimTemplate *v1.PersistentVolumeClaim `json:"volumeClaimTemplate,omitempty"`
VolumeClaimTemplate VolumeClaimConfig `json:"volumeClaimTemplate,omitempty,omitzero"`
Copy link
Contributor

@everettraven everettraven Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be cautious of changing this.

If this changes the serialization of the type you'll need to upversion this to v1alpha2.

I had forgotten that *v1.PersistentVolumeClaim had been used in the past so it may be OK to stay consistent. Before promoting to a v1 though I'd recommend converting to a custom type so you can enforce better validations at admission time.

Copy link
Contributor

@everettraven everettraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the structure of this seems to be shaping up nicely.

I've got a handful more questions/comments on the API changes as-is. Once those are addressed I can do another comprehensive review to look at the tests and things like that.

Thanks @danielmellado for opening this new PR - GitHub is much happier loading this one now :P

@everettraven
Copy link
Contributor

The verify-crd-schema and verify-crdify jobs are failing because we intentionally changed the volumeClaimTemplate field from the full corev1.PersistentVolumeClaim type to a simpler VolumeClaimConfig struct with just sizeInGiB and storageClassName.

This was done based on reviewer feedback to provide better admission-time validation and avoid exposing ~40 PVC fields that don't make sense for this use case.

Since this is a TechPreview API (v1alpha1), breaking changes are allowed, and both tools explicitly say it's appropriate to override when you're making intentional changes rather than introducing regressions.

@everettraven do you have any another proposal for this? Besides that, my plan is to split this into smaller changes to make it easier to work/review if that's ok for you.

I left a comment on this but I'll explicitly reply here as well - if if changes the serialization of the type it is not OK to do in v1alpha1 if this has shipped at all. Older clients that relied on that field being the previous type would fail to serialize a valid object into a request body for the updated version of the resource.

The correct way to make this change is to up-version from v1alpha1 to v1alpha2.

Re: splitting into smaller changes, I'll leave that up to you. As-is this PR is pretty big but I think fairly manageable now that we've gotten a large majority of the structural changes for this API sorted out. The previous PR was just getting unwieldy due to the large number of review comments on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants