Skip to content

Conversation

@tjungblu
Copy link
Contributor

@tjungblu tjungblu commented Dec 4, 2025

This was inadvertently deleting guard pods during upgrades, which caused etcd quorum loss while another component drained a node during a static pod rollout.

Signed-off-by: Thomas Jungblut <tjungblu@redhat.com>
@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Dec 4, 2025
@openshift-ci-robot
Copy link

@tjungblu: This pull request references Jira Issue OCPBUGS-66334, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This was inadvertently deleting guard pods during upgrades, which caused etcd quorum loss while another component drained a node during a static pod rollout.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link

coderabbitai bot commented Dec 4, 2025

Walkthrough

Removed version alignment and synchronization checks from the guardRolloutPreCheck logic in the operator starter. The pre-check now only validates non-SNO topology using NewIsSingleNodePlatformFn, eliminating operator-version gating and etcd clusteroperator status synchronization waits.

Changes

Cohort / File(s) Summary
Operator startup pre-check logic
pkg/operator/starter.go
Removed etcd clusteroperator/operator version alignment verification, Status.Versions extraction, expected version comparison, and related synchronization/mismatch error handling from guardRolloutPreCheck. Simplified to only determine non-SNO topology.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

  • Verify removal impact: Ensure no downstream code depends on the removed version alignment checks or synchronization waits during operator startup.
  • Operator initialization flow: Confirm the simplified pre-check (topology-only) doesn't bypass critical version compatibility validations elsewhere.
  • Error handling: Review whether elimination of not-synced and mismatch error scenarios could mask version inconsistencies in SNO/non-SNO deployments.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 57c4cb5 and a703d01.

📒 Files selected for processing (1)
  • pkg/operator/starter.go (0 hunks)
💤 Files with no reviewable changes (1)
  • pkg/operator/starter.go

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from dusk125 and jubittajohn December 4, 2025 10:52
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 4, 2025
@dusk125
Copy link
Contributor

dusk125 commented Dec 4, 2025

/retest-required

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 4, 2025

@tjungblu: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dusk125
Copy link
Contributor

dusk125 commented Dec 4, 2025

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 4, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 4, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dusk125, tjungblu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tjungblu
Copy link
Contributor Author

tjungblu commented Dec 5, 2025

Thanks @dusk125 - I'm going to leave this here until the critical fix label is lifted again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants