-
Notifications
You must be signed in to change notification settings - Fork 25
OCPBUGS-69434: openshift: CAPI IPAM TechPreviewNoUpgrade: set webhooks failurepolicy: Ignore #256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-69434: openshift: CAPI IPAM TechPreviewNoUpgrade: set webhooks failurepolicy: Ignore #256
Conversation
…olicy: Ignore Add functions to set the failurePolicy to Ignore for both mutating and validating webhooks handling IPAM resources. During bootstrap, the bootstrap node's Kube API Server receives IPAM create requests but is unable to reach the webhooks in the Cluster API namespace. This is because the bootstrap node doesn't have a route to the pods as it doesn't have access to the pod networks. If failurePolicy is set to Fail, the KAS cannot reach the webhook endpoints and the request fails, preventing creation of IPAddress and IPAddressClaim resources. This causes a chicken-and-egg problem as it prevents IPAM provisioning for the workers which won't start without their IP addresses being allocated. Setting failurePolicy to Ignore allows the resources to be created even when the webhooks are unreachable during bootstrap, matching what Machine API also does. More context: https://redhat-internal.slack.com/archives/C0A2M43S199/p1765540108488539
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: damdo The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@damdo: This PR was included in a payload test run from openshift/installer#10158
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/25d712e0-d9b7-11f0-915d-938ad3de1c37-0 |
|
@damdo: This PR was included in a payload test run from openshift/installer#10158
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/290b3bd0-d9b7-11f0-84df-7de543f5755b-0 |
mdbooth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we actually want to do this, I approve of this method of achieving it.
/lgtm
| # If failurePolicy is set to Fail, the KAS cannot reach the webhook endpoints and the request fails, preventing creation of IPAddress and IPAddressClaim resources. | ||
| # | ||
| # This causes a chicken-and-egg problem as it prevents IPAM provisioning | ||
| # for the workers which won't start without their IP addresses being allocated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically s/start/be created/, but not worth updating unless this needs a respin anyway.
|
/hold |
What validations are in place on IPAddress and IPAddressClaim that require webhooks to be in place? What is the risk here? Is there anything in the validations that would absolutely be required that prevents IPAddress objects from being valid unless they've been through the webhook? |
|
Reviewing the webhooks, what if we carried a patch to generate a proper CEL based CRD schema that implements all of the same validations, and disabled the webhooks for these types? |
@JoelSpeed that's already the plan. |
|
/retest |
|
@damdo: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
@damdo: This pull request references Jira Issue OCPBUGS-69434, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Had a chat out of band with @JoelSpeed
|
|
/unhold |
|
/label acknowledge-critical-fixes-only Fixes an issue that breaks vsphere-static installs on TPNU |
|
I've been doing some tests and the results are here :ballot_box_with_ballot: 📫 result has come back green: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/openshift-cluster-api-256-openshift-installer-10158-nightly-4.21-e2e-vsphere-static-ovn-techpreview/2000553562261688320 the clusterbot cluster I kicked off with |
|
@damdo: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/37dedfd0-da99-11f0-9de6-176ea443b671-0 |
|
@damdo: This PR was included in a payload test run from #256
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/37dedfd0-da99-11f0-9de6-176ea443b671-0 |
|
/hold for confirmation before merging |
|
/verified by #256 (comment) |
|
@damdo: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cherry-pick release-4.21 |
|
@damdo: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/jira refresh |
|
@damdo: This pull request references Jira Issue OCPBUGS-69434, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@jcpowermac: This PR was included in a payload test run from openshift/installer#10168
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/6d31a270-da9f-11f0-8de3-09718ad99d34-0 |
|
@jcpowermac: This PR was included in a payload test run from openshift/installer#10168
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/07912d30-daa1-11f0-8e10-4efbcd965801-0 |
|
@jcpowermac: This PR was included in a payload test run from openshift/installer#10168
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/90986500-daa9-11f0-8dd5-cf2c3cb09eb7-0 |
|
@jcpowermac: This PR was included in a payload test run from openshift/installer#10168
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/07289470-daae-11f0-9e47-f18678d47bb6-0 |
|
@tthvo: This PR was included in a payload test run from openshift/installer#10168
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/1a7751a0-dad2-11f0-9f77-667ab48c05be-0 |
|
@jcpowermac: This PR was included in a payload test run from openshift/installer#10169
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/10d23290-db46-11f0-8441-a9ced9f47a4d-0 |
|
Discussed with folks on https://redhat-internal.slack.com/archives/C0A2M43S199/p1765993521710459?thread_ts=1765540108.488539&cid=C0A2M43S199 We are ready to go ahead with this /unhold |
|
@damdo: Jira Issue OCPBUGS-69434: Some pull requests linked via external trackers have merged: The following pull request, linked via external tracker, has not merged:
All associated pull requests must be merged or unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-69434 has not been moved to the MODIFIED state. This PR is marked as verified. If the remaining PRs listed above are marked as verified before merging, the issue will automatically be moved to VERIFIED after all of the changes from the PRs are available in an accepted nightly payload. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@damdo: new pull request created: #257 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Fix included in accepted release 4.22.0-0.nightly-2025-12-18-234253 |
Add functions to set the failurePolicy to Ignore for both mutating and validating webhooks handling IPAM resources.
During bootstrap, the bootstrap node's Kube API Server receives IPAM create requests but is unable
to reach the webhooks in the Cluster API namespace.
This is because the bootstrap node doesn't have a route to the pods as it doesn't have access to the pod networks.
If failurePolicy is set to Fail, the KAS cannot reach the webhook endpoints and the request fails, preventing creation of IPAddress and IPAddressClaim resources.
This causes a chicken-and-egg problem as it prevents IPAM provisioning
for the workers which won't start without their IP addresses being allocated.
Setting failurePolicy to Ignore allows the resources to be created even when the webhooks are
unreachable during bootstrap, matching what Machine API also does.
More context: https://redhat-internal.slack.com/archives/C0A2M43S199/p1765540108488539