-
Notifications
You must be signed in to change notification settings - Fork 86
Add deployment readiness conditions to DPA status #2047
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: oadp-dev
Are you sure you want to change the base?
Add deployment readiness conditions to DPA status #2047
Conversation
This change addresses the gap where DPA could show Reconciled=True while underlying pods (Velero, Non-Admin, VMFR, NodeAgent) were not actually running due to deployment issues. Added new readiness conditions: - VeleroReady: Tracks Velero deployment readiness - NodeAgentReady: Tracks NodeAgent DaemonSet readiness - NonAdminReady: Tracks Non-Admin controller deployment readiness - VMFileRestoreReady: Tracks VM File Restore controller readiness Behavior: - Disabled components show True with reason ComponentDisabled - Missing deployments show False with reason ComponentNotFound - DPA requeues every 30 seconds until all components are ready - Existing Reconciled condition unchanged (tracks resource creation) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Skipping CI for Draft Pull Request. |
WalkthroughIntroduces readiness condition constants for DPA components, implements centralized readiness evaluation logic for Velero, NodeAgent, NonAdmin, and VMFileRestore through a new readiness module, integrates readiness checks into the main reconciliation loop, and adds comprehensive test coverage for all readiness scenarios. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to 📒 Files selected for processing (5)
💤 Files with no reviewable changes (1)
🧰 Additional context used📓 Path-based instructions (1)**⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (1)internal/controller/readiness_test.go (2)
🔇 Additional comments (11)
Comment |
|
summary lgtm will tal tmr. |
@shubham-pampattiwar nice.. looks |
| // Requeue if not all components are ready (to check again later) | ||
| if !allReady { | ||
| return ctrl.Result{RequeueAfter: 30 * time.Second}, err | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using watch would be more efficient and reactive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. For now using requeue to keep it simple - watches could cause frequent reconciles. Can optimize in a follow-up if 30s proves too slow in practice.
Address review comment: propagate readinessErr instead of just logging, so that API errors during readiness checks trigger a reconcile retry. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
@shubham-pampattiwar: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shubham-pampattiwar, sseago, weshayutin The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
mpryc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Summary
VeleroReady,NodeAgentReady,NonAdminReady,VMFileRestoreReadyTruewith reasonComponentDisabledProblem
Previously, DPA could show
Reconciled=Truewhile underlying pods were not running due to deployment issues (e.g., CrashLoopBackOff, image pull errors). This made it difficult to diagnose issues.Solution
Added separate readiness conditions that check:
Status.ReadyReplicas == Spec.ReplicasStatus.NumberReady == Status.DesiredNumberScheduledTest plan
make test)ComponentDisabledreason🤖 Generated with Claude Code