Skip to content

Comments

OCPCLOUD-3346: tls: use centralized TLS#1456

Open
RadekManak wants to merge 7 commits intoopenshift:mainfrom
RadekManak:feature-centralized-tsl-endpoints
Open

OCPCLOUD-3346: tls: use centralized TLS#1456
RadekManak wants to merge 7 commits intoopenshift:mainfrom
RadekManak:feature-centralized-tsl-endpoints

Conversation

@RadekManak
Copy link
Contributor

@RadekManak RadekManak commented Jan 26, 2026

  • Serve machine-api-operator metrics directly over HTTPS (:8443) using controller-runtime’s metrics server with delegated authn/authz (WithAuthenticationAndAuthorization), and remove the MAO kube-rbac-proxy sidecar.
  • Add TLS profile awareness for MAO metrics:
    • read APIServer/cluster TLS profile on startup,
    • configure min TLS/ciphers from that profile,
    • watch for TLS profile changes and trigger shutdown so the pod restarts with updated TLS settings.
  • Propagate the same TLS profile to controller kube-rbac-proxy sidecars (machine, machineset, mhc) by generating --tls-min-version and profile-derived --tls-cipher-suites args.
  • Update manifests accordingly:
    • deployment ports/volume mounts/env (METRICS_PORT=8443) for direct secure serving,
    • RBAC to watch config.openshift.io/apiservers.
  • Include supporting dependency/vendor updates and minor follow-ups:
    • dependency bumps (controller-runtime, openshift/api, openshift/client-go, etc.),
    • go-build.sh root-dir handling fix,
    • lint/import cleanup (pkg/webhooks/machine_webhook.go, context import updates).

Notes

  • Provider/controller metrics remain behind kube-rbac-proxy and continue using the existing namespace/metrics authorization model.
  • MAO direct /metrics auth uses delegated token/SAR checks; scraper access relies on existing cluster-monitoring prometheus-k8s cluster RBAC.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 26, 2026
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Jan 26, 2026

@RadekManak: This pull request references OCPCLOUD-3346 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Summary

  • Replace the MAO metrics kube‑rbac‑proxy sidecar with direct HTTPS in the operator binary, using the serving cert mounted from the MAO secret.
  • Watch the APIServer TLS profile and trigger a controlled shutdown so MAO restarts and picks up TLS changes.
  • Propagate the APIServer TLS profile into machine-api-controllers kube‑rbac‑proxy args (cipher suites + min TLS), with unit coverage.

Details

  • Direct MAO metrics TLS
  • MAO now listens on :8443 and serves /metrics via ListenAndServeTLS using /etc/tls/private/tls.crt|tls.key.
  • The deployment drops the kube‑rbac‑proxy sidecar, mounts the serving cert into /etc/tls/private, and exposes port 8443.
  • RBAC is updated to allow reading apiservers for TLS profile fetch.
  • TLS profile reload
  • MAO fetches the APIServer TLS profile at startup and builds a tls.Config.
  • A config informer watches APIServer updates and triggers shutdown on profile changes.
  • Centralized proxy TLS for controllers
  • OperatorConfig now carries the TLS profile.
  • machine-api-controllers kube‑rbac‑proxy args are generated from the profile (--tls-cipher-suites, --tls-min-version),
  • Tests updated to include APIServer presence and TLS profile expectations; a focused test validates proxy TLS args.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RadekManak
Copy link
Contributor Author

/assign @damdo

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good, a couple of minor points.

var tlsProfile *osconfigv1.TLSProfileSpec
apiServer, err := optr.osClient.ConfigV1().APIServers().Get(context.Background(), "cluster", metav1.GetOptions{})
if err != nil {
klog.Warningf("Failed to fetch APIServer, using default TLS profile: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return here?
What would happen otherwise?

} else {
profile, err := utiltls.GetTLSProfileSpec(apiServer.Spec.TLSSecurityProfile)
if err != nil {
klog.Warningf("Failed to get TLS profile spec, using defaults: %v", err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we return here?
What would happen otherwise?

}

func newKubeProxyContainers(image string, withMHCProxy bool) []corev1.Container {
func newKubeProxyContainers(image string, withMHCProxy bool, tlsProfile *configv1.TLSProfileSpec) []corev1.Container {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not pass the pointer but the whole tlsProfile

Comment on lines 873 to 875
// Use defaults if no profile provided
ciphers := utiltls.DefaultTLSCiphers
minVersion := utiltls.DefaultMinTLSVersion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from damdo. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Remove the kube-rbac-proxy sidecar, mount the serving cert, and
restart the operator on APIServer TLS profile changes.
Capture the APIServer TLS profile in operator config and use it to
configure kube-rbac-proxy TLS args, with unit coverage.
Add unit tests to verify TLS configuration handling in
newKubeProxyContainer, including tests for TLS 1.2 with cipher suites
and TLS 1.3 without cipher suites.
@RadekManak RadekManak force-pushed the feature-centralized-tsl-endpoints branch from 4e665c0 to 24eed11 Compare February 18, 2026 14:55
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Feb 18, 2026

@RadekManak: This pull request references OCPCLOUD-3346 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

  • Serve machine-api-operator metrics directly over HTTPS (:8443) using controller-runtime’s metrics server with delegated authn/authz (WithAuthenticationAndAuthorization), and remove the MAO kube-rbac-proxy sidecar.
  • Add TLS profile awareness for MAO metrics:
  • read APIServer/cluster TLS profile on startup,
  • configure min TLS/ciphers from that profile,
  • watch for TLS profile changes and trigger shutdown so the pod restarts with updated TLS settings.
  • Propagate the same TLS profile to controller kube-rbac-proxy sidecars (machine, machineset, mhc) by generating --tls-min-version and profile-derived --tls-cipher-suites args.
  • Update manifests accordingly:
  • deployment ports/volume mounts/env (METRICS_PORT=8443) for direct secure serving,
  • RBAC to watch config.openshift.io/apiservers.
  • Include supporting dependency/vendor updates and minor follow-ups:
  • dependency bumps (controller-runtime, openshift/api, openshift/client-go, etc.),
  • go-build.sh root-dir handling fix,
  • lint/import cleanup (pkg/webhooks/machine_webhook.go, context import updates).

Notes

  • Provider/controller metrics remain behind kube-rbac-proxy and continue using the existing namespace/metrics authorization model.
  • MAO direct /metrics auth uses delegated token/SAR checks; scraper access relies on existing cluster-monitoring prometheus-k8s cluster RBAC.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@RadekManak RadekManak force-pushed the feature-centralized-tsl-endpoints branch from 24eed11 to 6b35a05 Compare February 18, 2026 15:25
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 18, 2026

@RadekManak: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/unit 6b35a05 link true /test unit
ci/prow/e2e-aws-ovn 6b35a05 link true /test e2e-aws-ovn
ci/prow/e2e-metal-ipi-ovn-ipv6 6b35a05 link true /test e2e-metal-ipi-ovn-ipv6
ci/prow/e2e-aws-operator 6b35a05 link true /test e2e-aws-operator
ci/prow/e2e-aws-ovn-upgrade 6b35a05 link true /test e2e-aws-ovn-upgrade

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants