-
Notifications
You must be signed in to change notification settings - Fork 462
AGENT-1391: Use TLS cert for the InternalReleaseImage registry #5483
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch injects the specific tls certificate generated by the installer into the IRI registry. It also adds the root CA certificate to all the master nodes, to allow pulling the images from other nodes registry
|
@andfasano: This pull request references AGENT-1391 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
6e1c618 to
eab86cb
Compare
pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
Outdated
Show resolved
Hide resolved
pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
Outdated
Show resolved
Hide resolved
pkg/controller/internalreleaseimage/internalreleaseimage_renderer.go
Outdated
Show resolved
Hide resolved
eab86cb to
ab9024b
Compare
reviewed bootstrap/controller logic to allow generation and management of multiple machine configs
ab9024b to
9f74756
Compare
members of the agent team
|
@andfasano: This pull request references AGENT-1391 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, so, the CA wasn't distributed to workers? Strange... but makes sense, as I see the regular, non-bootstrap, MCS doesn't append the certicates as part of the ignition config. If so, maybe that's because we are already distributing them as part of the other MCs for worker pools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither to masters AFAICS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, we removed this being distributed to nodes since it is essentially non functional once the node joins the cluster; and we did not want future rotations to cause an MC rollout. The rootCA is distributed via user-data-secret and not actually written/managed via MachineConfigs. I think adding it back makes sense here; although perhaps you could throw root-ca in templates/common? Perhaps we should move it to cert_writer as a follow-up, I can see this causing unexpected reboots during upgrades to releases with this PR?(or add a node disruption policy for these files too?) Thoughts, @yuqi-zhang?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: the 4.21 release for this feature (NoRegistryClusterInstall) will be a TP, so no upgrades will be involved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, how often does this get rotated? I do agree that this should probably live in the cert rotation path, but since this is only used for IRI enabled clusters (which should be a very small percentage) maybe we're ok with that being a nodedisruptionpolicy
| client: mcfgClient, | ||
|
|
||
| client: mcfgClient, | ||
| kubeClient: kubeClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to avoid using the client and use listers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific reason (ie caching) - I suppose you're referring mainly to resource fetching? Consider that the activity for this controller is pretty limited, looking for very few resources. IIRC also the testing was easier with the regular fake clients, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some API limits in the openshift tests that catch controllers making too many client resource requests. I don't think you'd hit those, but maybe we can make a followup improvement for this
| @@ -0,0 +1,5 @@ | |||
| mode: 0644 | |||
| path: "/etc/pki/ca-trust/source/anchors/root-ca.crt" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you consider prefixing this with iri as well to keep it consistent, and for anyone looking at the file to see what it's coming from? Or is the name a hard requirement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, how often does this get rotated? I do agree that this should probably live in the cert rotation path, but since this is only used for IRI enabled clusters (which should be a very small percentage) maybe we're ok with that being a nodedisruptionpolicy
| @@ -0,0 +1,5 @@ | |||
| mode: 0644 | |||
| path: "/etc/pki/ca-trust/source/anchors/root-ca.crt" | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above applies, preferrably the naming on-disk is indicative of what it's used for, since root-ca is a bit of an overloaded term
| client: mcfgClient, | ||
|
|
||
| client: mcfgClient, | ||
| kubeClient: kubeClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some API limits in the openshift tests that catch controllers making too many client resource requests. I don't think you'd hit those, but maybe we can make a followup improvement for this
|
/payload-job periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-1of2 periodic-ci-openshift-machine-config-operator-release-4.21-periodics-e2e-aws-mco-disruptive-techpreview-2of2 periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview-serial-1of3 periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview-serial-2of3 periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview-serial-3of3 |
|
@djoshy: trigger 5 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/730b8ff0-d6ce-11f0-9c07-a59bf5b671ce-0 |
yuqi-zhang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/acknowledge-critical-fixes-only
/hold
For tech preview and install support, I'm fine with merging this. I would like to see followup plans around:
- cert rotation and MCD-side rotation upgrade handling
- renaming the certificates and filepaths to be more indicative of what they're used for
- consider using less direct API calls for object fetching, etc.
Holding until we are certain that payloads will continue to work, as we would not like to break this right before branching, even if it is TP
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andfasano, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/acknowledge-critical-fixes-only |
|
/payload-job periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview-serial-2of3 periodic-ci-openshift-release-master-ci-4.21-e2e-aws-ovn-techpreview-serial-3of3 The first run seems promising, I will let the failed ones try again, but the original failures seem unrelated to this. @andfasano in your AM, if you're comfortable with the state of the PR, feel free to unhold and mark verified if you are doing the verification. All the other labels should be applied so hopefully it'll merge |
|
@yuqi-zhang: trigger 2 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6a9c82b0-d6fc-11f0-8fc2-ccb5502fd2c6-0 |
Thank you @yuqi-zhang, see https://issues.redhat.com/browse/AGENT-1392 for the cert rotation. I will followup asap on the other two points (https://issues.redhat.com/browse/AGENT-1395) |
|
/verified later @andfasano |
|
@andfasano: This PR has been marked to be verified later by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test e2e-aws-ovn |
|
@andfasano: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/test e2e-gcp-op-single-node |
0bc0e71
into
openshift:main
|
/cherry-pick release-4.21 |
|
@andfasano: new pull request created: #5495 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
- What I did
This patch uses the TLS cert generated by the installer (see openshift/installer#10147) for the InternalReleaseImage registry.
Also, the root CA, used to sign the IRI cert, is added to the trust store of all the nodes.
This patch also adds the OWNERS file for the InternalReleaseImage controller folder, managed by the agent team
- How to verify it
Add the InternalReleaseImage resource in the bootstrap manifests dir (/etc/mcc/bootstrap) and enable the NoRegistryClusterInstall feature gate.
- Description for the changelog
Add support for InternalReleaseImage resource