wip-drain capture-codex potter #4188

hongyunyan · 2026-02-11T02:48:56Z

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

Release Notes

New Features
- Added node draining capability to gracefully take nodes offline while automatically moving workloads to healthy nodes.
- Introduced node liveness states (Alive, Draining, Stopping) for improved node lifecycle management.
- Implemented automatic workload migration off draining nodes.
Bug Fixes
- Fixed health checks to correctly recognize draining nodes as healthy.
- Improved node state transition enforcement to prevent invalid state sequences.

ti-chi-bot · 2026-02-11T02:49:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wlwilliamx for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-02-11T02:49:10Z

📝 Walkthrough

Walkthrough

This pull request introduces a comprehensive node liveness tracking and coordinated draining system. New components include state management (Alive/Draining/Stopping), drain controllers, liveness views for filtering, heartbeat messaging infrastructure, and integrations across schedulers, APIs, and maintainer nodes to enable graceful node drain operations.

Changes

Cohort / File(s)	Summary
Liveness State Machine `pkg/api/util.go`, `pkg/api/util_test.go`	Added LivenessCaptureDraining constant and enforced monotonic state transitions (Alive → Draining → Stopping) with atomic compare-and-swap validation.
Liveness View & Tracking `coordinator/nodeliveness/view.go`, `coordinator/nodeliveness/view_test.go`	New in-memory liveness view tracking per-node heartbeats, epochs, and TTL-based state derivation; provides filtering for schedulable nodes and state enumeration.
Drain Controller & Logic `coordinator/drain/controller.go`, `coordinator/drain/controller_test.go`	Full-featured drain orchestrator managing node draining via SetNodeLiveness messaging, computing remaining work from changefeeds and inflight operators, with resend policies and safety rules.
Heartbeat Messaging Infrastructure `heartbeatpb/heartbeat.proto`, `pkg/messaging/message.go`	Added NodeLiveness enum (Alive, Draining, Stopping) and three new message types (NodeHeartbeat, SetNodeLivenessRequest, SetNodeLivenessResponse) with protobuf definitions and message decoding/encoding.
Scheduler Integration `coordinator/scheduler/basic.go`, `coordinator/scheduler/balance.go`, `coordinator/scheduler/drain.go`, `coordinator/scheduler/drain_test.go`	Updated basic and balance schedulers to filter nodes via liveness view; introduced new drain scheduler producing move operations from draining nodes to least-loaded destinations.
Coordinator Wiring `coordinator/controller.go`, `coordinator/coordinator.go`	Integrated liveness view and drain controller into coordinator initialization; exposed DrainNode API; wired heartbeat and liveness-response handling; scheduled drain controller in task plan.
API & Health Endpoints `api/v1/api.go`, `api/v2/health.go`	Refactored v1 drainCapture to use coordinator DrainNode with dynamic work tracking; updated v2 health to treat Draining state as healthy.
Maintainer Node Integration `maintainer/maintainer_manager.go`, `maintainer/maintainer_manager_test.go`	Added liveness and epoch fields; implemented SetNodeLivenessRequest handling; added periodic node heartbeat sending; integrated with bootstrap and coordinator messaging.
Operator Controller Enhancement `coordinator/operator/operator_controller.go`	Added HasOperatorByID and CountOperatorsInvolvingNode methods for querying operator state (used by drain controller for remaining work calculation).
Coordinator Tests & Server Integration `coordinator/coordinator_test.go`, `server/server.go`, `server/module_election.go`	Updated test infrastructure with newTestNodeWithListener helper; refactored election module to check alive status and add active resignation watchdog when stopping; updated MaintainerManager constructor calls with liveness parameter.
Scheduler Constant `pkg/scheduler/scheduler.go`	Added DrainScheduler constant for scheduler registry and identification.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client/API
    participant Coord as Coordinator
    participant DC as DrainController
    participant NLV as NodeLiveness
    participant MM as Maintainer
    participant Sched as Scheduler

    Client->>Coord: DrainNode(nodeID)
    Coord->>DC: RequestDrain(nodeID)
    DC->>NLV: Update state tracking
    
    MM->>Coord: NodeHeartbeat (periodic)
    Coord->>NLV: HandleNodeHeartbeat
    
    Coord->>DC: SetNodeLivenessRequest
    DC-->>Coord: SetNodeLiveness(DRAINING)
    Coord->>MM: SetNodeLivenessRequest
    MM->>Coord: SetNodeLivenessResponse(DRAINING)
    Coord->>NLV: HandleSetNodeLivenessResponse
    
    Sched->>NLV: FilterSchedulableDestNodes
    NLV-->>Sched: Alive nodes only
    Sched->>DC: Remaining(nodeID)
    DC-->>Sched: Count of inflight ops + changefeeds
    
    Sched->>Coord: MoveMaintainerOperator
    Coord-->>Client: Drain progressing

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

lgtm, approved, size/XXL

Suggested reviewers

wk989898
lidezhu
flowbehappy

Poem

🐰 A drain we weave through node and time,
With hearts that beat in rhythm prime,
State flows from Alive to pause, then rest,
The coordinator's drain—a graceful quest!
Now schedulers skip the draining few,
While maintenance ops see safely through. ✨

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is largely a template with no substantive content filled in. Required sections like issue reference, change explanation, test type selection, and release notes are incomplete or left as template prompts.	Fill in the PR description with: issue reference, explanation of node drain implementation, test type (unit tests are included), answer compatibility questions, and provide appropriate release notes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 13.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'wip-drain capture-codex potter' is vague and unclear. WIP suggests work-in-progress status, and terms like 'codex potter' lack descriptive meaning about the actual changes.	Replace with a clear, concise title describing the main feature, such as 'Implement node drain (draining state) for graceful node shutdown' or 'Add drain scheduler and node liveness tracking'.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-11T02:49:49Z

Summary of Changes

Hello @hongyunyan, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a robust node draining feature, crucial for maintaining cluster stability during maintenance or scaling operations. It integrates new node liveness states into the coordinator's scheduling and election logic, ensuring that nodes can be gracefully removed without disrupting ongoing changefeeds. The changes span API updates, new controller and scheduler components, and modifications to the messaging and heartbeat protocols.

Highlights

Node Draining Mechanism: Introduced a comprehensive node draining mechanism, allowing for graceful removal of nodes from the cluster by migrating their assigned tasks.
API Refactoring for Drain: The drainCapture API (v1) has been updated to utilize the new coordinator-driven draining logic, replacing its previous no-op implementation.
Node Liveness States: New node liveness states (DRAINING, STOPPING) and associated heartbeat messages have been added to facilitate controlled node lifecycle management.
Scheduler Integration: Existing schedulers (BasicScheduler, BalanceScheduler) now consider node liveness when making scheduling decisions, avoiding draining or stopping nodes as destinations. A new DrainScheduler actively moves tasks from draining nodes.
Coordinator Election Awareness: The coordinator election process now respects node liveness, preventing nodes in DRAINING or STOPPING states from campaigning for leadership.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

api/v1/api.go
- Implemented actual drain logic for the drainCapture API, replacing the previous no-op behavior.
api/v2/health.go
- Updated the health check endpoint to consider LivenessCaptureDraining as a healthy state, preventing unnecessary alerts during graceful shutdowns.
coordinator/controller.go
- Integrated drain.Controller and nodeliveness.View to manage node draining and liveness states.
- Added a DrainNode method to the controller to initiate node draining.
coordinator/coordinator.go
- Exposed the DrainNode method for API v1 compatibility, allowing external requests to trigger node draining.
coordinator/coordinator_test.go
- Refactored test node setup to use newTestNodeWithListener and httptest.NewServer for improved test isolation and resource management.
coordinator/drain/controller.go
- Added a new drain package containing the Controller responsible for driving node drain by sending liveness requests and tracking remaining work.
coordinator/drain/controller_test.go
- Added unit tests for the drain.Controller to verify its safety rules and quiescent promotion logic.
coordinator/nodeliveness/view.go
- Added a new nodeliveness package with a View component to maintain an in-memory view of node-reported liveness states.
coordinator/nodeliveness/view_test.go
- Added unit tests for the nodeliveness.View to ensure correct state transitions and filtering behavior.
coordinator/operator/operator_controller.go
- Added HasOperatorByID and CountOperatorsInvolvingNode methods to query the status of in-flight operators.
coordinator/scheduler/balance.go
- Modified the BalanceScheduler to filter out non-alive nodes (draining/stopping/unknown) when selecting scheduling destinations.
coordinator/scheduler/basic.go
- Modified the BasicScheduler to filter out non-alive nodes when selecting scheduling destinations.
coordinator/scheduler/drain.go
- Added a new drainScheduler to actively move maintainers out of nodes that are in a DRAINING state.
coordinator/scheduler/drain_test.go
- Added unit tests for the drainScheduler to ensure it correctly skips changefeeds with in-flight operators.
heartbeatpb/heartbeat.pb.go
- Generated Go code for new protobuf messages related to node liveness and drain requests/responses.
heartbeatpb/heartbeat.proto
- Defined new NodeLiveness enum, NodeHeartbeat, SetNodeLivenessRequest, and SetNodeLivenessResponse protobuf messages.
maintainer/maintainer_manager.go
- Integrated node liveness management, including sending node heartbeats and handling SetNodeLivenessRequest messages.
- Prevented adding new maintainers to nodes that are in a STOPPING state.
maintainer/maintainer_manager_test.go
- Updated maintainer manager tests to include the new Liveness parameter during initialization.
pkg/api/util.go
- Extended the Liveness enum to include LivenessCaptureDraining.
- Enhanced the Store method for Liveness to enforce monotonic state transitions (Alive -> Draining -> Stopping).
pkg/api/util_test.go
- Added unit tests for the Liveness utility to verify monotonic state transitions.
pkg/messaging/message.go
- Added new message types (TypeNodeHeartbeatRequest, TypeSetNodeLivenessRequest, TypeSetNodeLivenessResponse) for node liveness communication.
pkg/scheduler/scheduler.go
- Added DrainScheduler as a new scheduler type constant.
server/module_election.go
- Updated coordinator and log coordinator election logic to check node liveness, preventing non-alive nodes from campaigning or actively resigning if liveness changes to stopping.
server/server.go
- Modified the initialization of MaintainerManager to pass the server's Liveness instance.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

ti-chi-bot · 2026-02-11T02:52:18Z

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

_{📖 For more info, you can check the "Contribute Code" section in the development guide.}

gemini-code-assist

Code Review

This pull request introduces a node draining mechanism, a significant feature for graceful shutdown and maintenance, implemented across multiple components like the API, coordinator, schedulers, and maintainer manager, using a new drain.Controller and DrainScheduler for core logic and a node liveness mechanism. However, a significant security oversight was identified where the drainCapture API endpoint is exposed without authentication, allowing for potential Denial of Service attacks. It is recommended to apply authentication middleware to all sensitive API v1 endpoints, especially those that perform state-changing operations like draining captures. Additionally, there are other areas for improvement, including a potential data race, an inconsistency in how node states are handled, and some confusing enum values.

gemini-code-assist · 2026-02-11T02:54:08Z

api/v1/api.go

 func (o *OpenAPIV1) drainCapture(c *gin.Context) {
 	var req drainCaptureRequest
 	if err := c.ShouldBindJSON(&req); err != nil {
 		_ = c.Error(errors.ErrAPIInvalidParam.Wrap(err))
 		return
 	}
-	drainCaptureCounter.Add(1)
-	if drainCaptureCounter.Load()%10 == 0 {
-		log.Info("api v1 drainCapture", zap.Any("captureID", req.CaptureID), zap.Int64("currentTableCount", drainCaptureCounter.Load()))
-		c.JSON(http.StatusAccepted, &drainCaptureResp{
-			CurrentTableCount: 10,
-		})
-	} else {
-		log.Info("api v1 drainCapture done", zap.Any("captureID", req.CaptureID), zap.Int64("currentTableCount", drainCaptureCounter.Load()))
-		c.JSON(http.StatusAccepted, &drainCaptureResp{
-			CurrentTableCount: 0,
-		})
+
+	coordinator, err := o.server.GetCoordinator()
+	if err != nil {
+		_ = c.Error(err)
+		return
+	}
+	drainable, ok := coordinator.(interface {
+		DrainNode(ctx context.Context, nodeID string) (int, error)
+	})
+	if !ok {
+		_ = c.Error(stdErrors.New("coordinator does not support node drain"))
+		return
 	}
+
+	remaining, err := drainable.DrainNode(c.Request.Context(), req.CaptureID)
+	if err != nil {
+		_ = c.Error(err)
+		return
+	}
+	log.Info("api v1 drainCapture",
+		zap.String("captureID", req.CaptureID),
+		zap.Int("remaining", remaining))
+	c.JSON(http.StatusAccepted, &drainCaptureResp{
+		CurrentTableCount: remaining,
+	})
 }


The drainCapture API endpoint, now fully implemented and functional, lacks authenticateMiddleware in its route registration (RegisterOpenAPIV1Routes). This is a significant security vulnerability, as it allows any unauthenticated user to trigger a node drain operation, potentially leading to a Denial of Service (DoS) by draining all nodes. It is crucial to protect the captureGroup or the specific drain route with authenticateMiddleware. Furthermore, for consistency with other error handling and to provide more structured error information to clients, it's better to use a typed error from the pkg/errors package instead of a raw string error from the standard library.

_ = c.Error(errors.ErrInternalServerError.WithMessage("coordinator does not support node drain"))

gemini-code-assist · 2026-02-11T02:54:08Z

coordinator/drain/controller.go

+	c.mu.Lock()
+	st.lastSendDrain = now
+	c.mu.Unlock()


The variable st was captured from a previous critical section (lines 64-67) and is being used here after the lock has been released and re-acquired. This is a potential data race. To ensure thread safety, you should re-fetch the state from the map within this new critical section.

c.mu.Lock() if st, ok := c.nodes[nodeID]; ok { st.lastSendDrain = now } c.mu.Unlock()

gemini-code-assist · 2026-02-11T02:54:08Z

pkg/api/util.go

 const (
 	// LivenessCaptureAlive means the capture is alive, and ready to serve.
 	LivenessCaptureAlive Liveness = 0
+	// LivenessCaptureDraining means the capture is preparing to go offline.
+	// It should not be selected as a scheduling destination and should not campaign leadership.
+	LivenessCaptureDraining Liveness = 2
 	// LivenessCaptureStopping means the capture is in the process of graceful shutdown.
 	LivenessCaptureStopping Liveness = 1
 )


The integer values for the Liveness enum are non-sequential and confusing (Alive=0, Stopping=1, Draining=2). This is inconsistent with the corresponding protobuf enum NodeLiveness where DRAINING=1 and STOPPING=2. This discrepancy leads to confusing value-swapping logic in maintainer/maintainer_manager.go and makes the code error-prone.

To improve clarity and reduce the chance of bugs, please consider making the values sequential and consistent with the protobuf definition. A more logical order would be:
LivenessCaptureAlive Liveness = 0
LivenessCaptureDraining Liveness = 1
LivenessCaptureStopping Liveness = 2

This change would also simplify the mapping logic in maintainer/maintainer_manager.go.

gemini-code-assist · 2026-02-11T02:54:08Z

coordinator/nodeliveness/view.go

+// GetNodesByState returns node IDs whose derived state equals state.
+func (v *View) GetNodesByState(state State, now time.Time) []node.ID {
+	v.mu.RLock()
+	defer v.mu.RUnlock()
+
+	out := make([]node.ID, 0)
+	for id, r := range v.data {
+		if r == nil || !r.everSeenHeartbeat {
+			continue
+		}
+		s := StateAlive
+		if v.ttl > 0 && now.Sub(r.lastSeen) > v.ttl {
+			s = StateUnknown
+		} else {
+			switch r.liveness {
+			case heartbeatpb.NodeLiveness_DRAINING:
+				s = StateDraining
+			case heartbeatpb.NodeLiveness_STOPPING:
+				s = StateStopping
+			default:
+				s = StateAlive
+			}
+		}
+		if s == state {
+			out = append(out, id)
+		}
+	}
+	return out


The logic in GetNodesByState is inconsistent with GetState regarding never-seen nodes. GetState returns StateAlive for a never-seen node (for backward compatibility during rollout), but GetNodesByState(StateAlive, ...) will not include it because it only iterates over nodes present in the data map.

This could lead to confusing behavior. For example, FilterSchedulableDestNodes would consider a never-seen node as schedulable, but a call to GetNodesByState(StateAlive) would not list it.

While the current usage in drain.Controller seems safe as it only queries for StateDraining and StateStopping, this inconsistency might cause issues in the future. Please consider aligning the behavior to make the component more predictable.

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/api/util.go (1)
201-211: ⚠️ Potential issue | 🟡 Minor

String() performs a non-atomic read of the liveness value.

switch *l dereferences the pointer without using atomic.LoadInt32. For consistency with Load() and correctness under concurrent access, use l.Load().
Proposed fix
 func (l *Liveness) String() string {
-	switch *l {
+	switch l.Load() {
 	case LivenessCaptureAlive:
 		return "Alive"
 	case LivenessCaptureDraining:

🤖 Fix all issues with AI agents

In `@coordinator/drain/controller.go`:
- Around line 214-238: The sendSetNodeLiveness method currently accepts an
unused now time.Time parameter; remove the unused parameter by changing the
Controller.sendSetNodeLiveness signature to func (c *Controller)
sendSetNodeLiveness(nodeID node.ID, target heartbeatpb.NodeLiveness) and update
all call sites that pass the now argument to call sendSetNodeLiveness(nodeID,
target) instead; ensure imports/unused variable checks are fixed after the
change and run go vet/build to confirm no remaining references to now remain.

In `@coordinator/nodeliveness/view.go`:
- Around line 162-189: GetNodesByState duplicates the TTL + liveness → State
mapping from GetState; extract that logic into a private, lock-free helper
(e.g., func (v *View) deriveStateFromRecord(r *record, now time.Time) State)
that reads r.everSeenHeartbeat, r.lastSeen, r.liveness and v.ttl to return the
derived State, then call this helper from both GetState and GetNodesByState (use
it on each r from v.data inside GetNodesByState while keeping the RLock). Ensure
the helper accepts a *record and now so no locking is performed inside it and
update both callers to use the new function to keep derivation rules in one
place.

In `@coordinator/scheduler/drain.go`:
- Around line 70-98: The drain scheduler currently uses s.batchSize directly and
can exceed controller capacity; compute availableSize := s.batchSize -
s.operatorController.OperatorSize() (clamp to >=0) at the start of Schedule loop
and use availableSize instead of s.batchSize when checking scheduled limits and
breaking out of loops (e.g., replace checks like scheduled >= s.batchSize with
scheduled >= availableSize). Keep all other logic (drainingNodes iteration,
s.rrIndex, s.changefeedDB.GetByNodeID, s.operatorController.HasOperatorByID,
pickLeastLoadedNode, s.operatorController.AddOperator, nodeTaskSize updates) the
same so drain scheduling respects the controller capacity cap.

In `@heartbeatpb/heartbeat.proto`:
- Around line 129-157: The NodeLiveness enum in heartbeat.proto has DRAINING=1
and STOPPING=2, but the Go constants LivenessCaptureDraining and
LivenessCaptureStopping in pkg/api/util.go are swapped; update the Go constants
so LivenessCaptureDraining = 1 and LivenessCaptureStopping = 2 to match the
proto (adjust the numeric values in the LivenessCapture... constant
declarations), then run tests/linters to ensure no usages rely on the old
numeric values; conversion helpers (if any) like those mapping NodeLiveness <->
Liveness should remain correct.

In `@server/module_election.go`:
- Around line 120-122: The resign failure is being masked by returning the wrong
error variable; update both places where we call e.resign(ctx) (the coordinator
resign path and the log coordinator path) to return the resignErr instead of err
and ensure the error logged with log.Warn still includes resignErr (e.g., change
the return from errors.Trace(err) to errors.Trace(resignErr) for the code paths
around e.resign(ctx) and the corresponding log coordinator branch).
- Around line 264-266: The resign error returned from the active resignation
path is being discarded because the code returns errors.Trace(err) (where err is
nil after a successful Campaign) instead of the actual resignErr; update the
return in the resignation failure branch in module_election.go to return
errors.Trace(resignErr) and ensure the warning log still uses
zap.Error(resignErr) (refer to resignErr and the log.Warn call near the active
resign branch in the election handling code).

🧹 Nitpick comments (12)

server/module_election.go (2)
135-156: Resign watchdog: silently discarding the resign error.

Line 151 discards the resign error. If the resign fails (e.g., etcd is unreachable), the watchdog exits silently and the node continues holding leadership until the lease expires. Consider logging the error so operators can diagnose why leadership wasn't relinquished promptly.
Suggested improvement
 					log.Info("resign coordinator actively, liveness is stopping", zap.String("nodeID", nodeID))
 					resignCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
-					_ = e.resign(resignCtx)
+					if err := e.resign(resignCtx); err != nil {
+						log.Warn("resign coordinator failed in liveness watcher",
+							zap.String("nodeID", nodeID), zap.Error(err))
+					}
 					cancel()
 					return
274-293: Same issue: log coordinator watchdog silently discards resign error.

Same concern as the coordinator watchdog — resignLogCoordinator() error is silently discarded on line 289. Also, unlike the coordinator watchdog, this doesn't use a timeout context for the resign call (the timeout is inside resignLogCoordinator itself, so this is fine, but worth noting for consistency).
Suggested improvement
 					log.Info("resign log coordinator actively, liveness is stopping", zap.String("nodeID", nodeID))
-					_ = e.resignLogCoordinator()
+					if err := e.resignLogCoordinator(); err != nil {
+						log.Warn("resign log coordinator failed in liveness watcher",
+							zap.String("nodeID", nodeID), zap.Error(err))
+					}
 					return
pkg/api/util.go (1)

148-163: Numeric values don't match the documented monotonic progression order.

The comment documents ALIVE(0) -> DRAINING -> STOPPING, but the actual values are ALIVE=0, DRAINING=2, STOPPING=1. While Store() uses explicit switch logic (so this is functionally safe), the non-monotonic numbering is confusing and inconsistent with the proto definition (DRAINING=1, STOPPING=2). This likely exists for backward compatibility with LivenessCaptureStopping=1, but a brief comment explaining this would help future readers.

Also worth noting: the proto NodeLiveness enum has DRAINING=1, STOPPING=2 while this Go enum has Draining=2, Stopping=1 — see the related comment on heartbeat.proto.
pkg/api/util_test.go (1)
9-35: LGTM!

Tests cover the critical transition paths: monotonic progression, direct Alive→Stopping, and rejection of downgrades. Good use of require for clear assertion failures.

One minor gap: the Draining→Alive downgrade rejection isn't explicitly tested (only Stopping→Draining and Stopping→Alive are). Consider adding a brief assertion if you want full transition matrix coverage:
// In TestLivenessStoreMonotonic, after transitioning to Draining:
require.False(t, l.Store(LivenessCaptureAlive)) // reject downgrade
maintainer/maintainer_manager.go (1)
364-404: onSetNodeLivenessRequest: log message contains a hyphen.

Line 368: "ignore set node liveness request from non-coordinator" — the coding guideline says log message strings should avoid hyphens (use spaces instead). Consider "ignore set node liveness request from non coordinator" or rewording to "ignore set node liveness request, sender is not coordinator".

As per coding guidelines, "log message strings should not include function names and should avoid hyphens (use spaces instead)".
Suggested wording
-		log.Warn("ignore set node liveness request from non-coordinator",
+		log.Warn("ignore set node liveness request, sender is not coordinator",
coordinator/coordinator.go (1)

419-425: DrainNode always returns nil error — consider propagating errors from the controller.

Currently DrainNode always returns nil error. If the controller's DrainNode encounters an invalid/unknown node ID, the caller (API layer) has no way to return a meaningful error to the user. Consider whether controller.DrainNode should return an error for unknown nodes, or whether you want to validate the node ID here.
api/v1/api.go (2)
225-227: Log message includes handler name drainCapture.

Per coding guidelines, log message strings should not include function names. Consider rewording to something like "drain capture requested via api v1".

As per coding guidelines, "log message strings should not include function names and should avoid hyphens (use spaces instead)".
Suggested wording
-	log.Info("api v1 drainCapture",
+	log.Info("drain capture requested via api v1",
212-218: Use the project's errors package instead of stdErrors.New for consistency with codebase patterns.

Line 216 uses stdErrors.New(...), but all other error handling in this file uses the project's error types (e.g., errors.ErrAPIInvalidParam). While stdErrors.New() functionally results in an HTTP 500 response (which is semantically correct for this scenario), it bypasses the project's error handling conventions. Use errors.ErrInternalServerError.GenWithStackByArgs(...) to maintain consistency with the established pattern.
Suggested change
-	_ = c.Error(stdErrors.New("coordinator does not support node drain"))
+	_ = c.Error(errors.ErrInternalServerError.GenWithStackByArgs("coordinator does not support node drain"))
coordinator/controller.go (1)
126-126: Extract the 30 s liveness TTL into a named constant.

The TTL is a tuning knob that governs when a node is deemed "unknown". A named constant (or configuration parameter) would make it easier to discover and adjust.
Proposed fix
 const (
 	bootstrapperID                = "coordinator"
 	nodeChangeHandlerID           = "coordinator-controller"
 	createChangefeedMaxRetry      = 10
 	createChangefeedRetryInterval = 5 * time.Second
+	defaultLivenessTTL            = 30 * time.Second
 )
-	livenessView := nodeliveness.NewView(30 * time.Second)
+	livenessView := nodeliveness.NewView(defaultLivenessTTL)
coordinator/scheduler/drain.go (1)

108-123: pickLeastLoadedNode is correct but non-deterministic on ties.

Map iteration order is random in Go, so when multiple nodes share the minimum load, the winner is arbitrary. This is acceptable for load-balancing. A stable tie-breaker (e.g., by node ID) would make behavior more predictable and testable, but is not strictly necessary.
coordinator/drain/controller.go (2)
129-138: Lock churn: acquiring and releasing mu per-node inside the loop.

Each iteration of the two loops (lines 129-133 and 134-138) acquires and releases c.mu individually. Since this runs on a periodic tick with a small number of draining nodes, performance impact is negligible, but you could batch both loops under a single lock acquisition for clarity.
Proposed consolidation
 func (c *Controller) tick(now time.Time) {
 	if c.livenessView == nil {
 		return
 	}

-	for _, id := range c.livenessView.GetNodesByState(nodeliveness.StateDraining, now) {
-		c.mu.Lock()
-		c.mustGetStateLocked(id)
-		c.mu.Unlock()
-	}
-	for _, id := range c.livenessView.GetNodesByState(nodeliveness.StateStopping, now) {
-		c.mu.Lock()
-		c.mustGetStateLocked(id)
-		c.mu.Unlock()
-	}
-
-	c.mu.Lock()
-	ids := make([]node.ID, 0, len(c.nodes))
-	for id := range c.nodes {
-		ids = append(ids, id)
-	}
-	c.mu.Unlock()
+	drainingIDs := c.livenessView.GetNodesByState(nodeliveness.StateDraining, now)
+	stoppingIDs := c.livenessView.GetNodesByState(nodeliveness.StateStopping, now)
+
+	c.mu.Lock()
+	for _, id := range drainingIDs {
+		c.mustGetStateLocked(id)
+	}
+	for _, id := range stoppingIDs {
+		c.mustGetStateLocked(id)
+	}
+	ids := make([]node.ID, 0, len(c.nodes))
+	for id := range c.nodes {
+		ids = append(ids, id)
+	}
+	c.mu.Unlock()

 	for _, id := range ids {
 		c.tickNode(id, now)
 	}
 }
240-245: Replace the custom maxInt helper with the built-in max function.

Go 1.25.5 supports the built-in max() function introduced in Go 1.21, so this custom helper is unnecessary.

coderabbitai · 2026-02-11T02:56:17Z

coordinator/drain/controller.go

+func (c *Controller) sendSetNodeLiveness(nodeID node.ID, target heartbeatpb.NodeLiveness, now time.Time) {
+	epoch := uint64(0)
+	if c.livenessView != nil {
+		if e, ok := c.livenessView.GetNodeEpoch(nodeID); ok {
+			epoch = e
+		}
+	}
+
+	req := &heartbeatpb.SetNodeLivenessRequest{
+		Target:    target,
+		NodeEpoch: epoch,
+	}
+	msg := messaging.NewSingleTargetMessage(nodeID, messaging.MaintainerManagerTopic, req)
+	if err := c.mc.SendCommand(msg); err != nil {
+		log.Warn("send set node liveness request failed",
+			zap.Stringer("target", nodeID),
+			zap.String("liveness", target.String()),
+			zap.Error(err))
+		return
+	}
+	log.Info("send set node liveness request",
+		zap.Stringer("target", nodeID),
+		zap.String("liveness", target.String()),
+		zap.Uint64("nodeEpoch", epoch))
+}


⚠️ Potential issue | 🟡 Minor

Unused now parameter in sendSetNodeLiveness.

The now time.Time parameter is accepted but never referenced in the function body. Either use it (e.g., for logging or metrics) or remove it to avoid confusion.

Proposed fix

-func (c *Controller) sendSetNodeLiveness(nodeID node.ID, target heartbeatpb.NodeLiveness, now time.Time) { +func (c *Controller) sendSetNodeLiveness(nodeID node.ID, target heartbeatpb.NodeLiveness) {

And update the three call sites (lines 69, 170, 197) to drop the now argument.

🤖 Prompt for AI Agents

In `@coordinator/drain/controller.go` around lines 214 - 238, The sendSetNodeLiveness method currently accepts an unused now time.Time parameter; remove the unused parameter by changing the Controller.sendSetNodeLiveness signature to func (c *Controller) sendSetNodeLiveness(nodeID node.ID, target heartbeatpb.NodeLiveness) and update all call sites that pass the now argument to call sendSetNodeLiveness(nodeID, target) instead; ensure imports/unused variable checks are fixed after the change and run go vet/build to confirm no remaining references to now remain.

coderabbitai · 2026-02-11T02:56:17Z

coordinator/nodeliveness/view.go

+// GetNodesByState returns node IDs whose derived state equals state.
+func (v *View) GetNodesByState(state State, now time.Time) []node.ID {
+	v.mu.RLock()
+	defer v.mu.RUnlock()
+
+	out := make([]node.ID, 0)
+	for id, r := range v.data {
+		if r == nil || !r.everSeenHeartbeat {
+			continue
+		}
+		s := StateAlive
+		if v.ttl > 0 && now.Sub(r.lastSeen) > v.ttl {
+			s = StateUnknown
+		} else {
+			switch r.liveness {
+			case heartbeatpb.NodeLiveness_DRAINING:
+				s = StateDraining
+			case heartbeatpb.NodeLiveness_STOPPING:
+				s = StateStopping
+			default:
+				s = StateAlive
+			}
+		}
+		if s == state {
+			out = append(out, id)
+		}
+	}
+	return out


🛠️ Refactor suggestion | 🟠 Major

Duplicated state-derivation logic — extract a lock-free helper to keep GetState and GetNodesByState in sync.

GetNodesByState re-implements the TTL + liveness → State mapping that GetState already performs (lines 172-184 vs 106-116). If the derivation rules change (e.g., a new state is added), the two copies can diverge silently.

Extract the derivation into a private method that operates on a *record without acquiring the lock, then call it from both public methods.

Proposed refactor

+// deriveState returns the liveness state for a record without acquiring any locks. +func (v *View) deriveState(r *record, now time.Time) State { + if r == nil || !r.everSeenHeartbeat { + return StateAlive + } + if v.ttl > 0 && now.Sub(r.lastSeen) > v.ttl { + return StateUnknown + } + switch r.liveness { + case heartbeatpb.NodeLiveness_DRAINING: + return StateDraining + case heartbeatpb.NodeLiveness_STOPPING: + return StateStopping + default: + return StateAlive + } +} func (v *View) GetState(id node.ID, now time.Time) State { - var ( - lastSeen time.Time - liveness heartbeatpb.NodeLiveness - everSeen bool - ) v.mu.RLock() - r := v.data[id] - if r != nil { - lastSeen = r.lastSeen - liveness = r.liveness - everSeen = r.everSeenHeartbeat - } + r := v.data[id] v.mu.RUnlock() - - if r == nil || !everSeen { - return StateAlive - } - if v.ttl > 0 && now.Sub(lastSeen) > v.ttl { - return StateUnknown - } - switch liveness { - case heartbeatpb.NodeLiveness_DRAINING: - return StateDraining - case heartbeatpb.NodeLiveness_STOPPING: - return StateStopping - default: - return StateAlive - } + return v.deriveState(r, now) } func (v *View) GetNodesByState(state State, now time.Time) []node.ID { v.mu.RLock() defer v.mu.RUnlock() out := make([]node.ID, 0) for id, r := range v.data { - if r == nil || !r.everSeenHeartbeat { - continue - } - s := StateAlive - if v.ttl > 0 && now.Sub(r.lastSeen) > v.ttl { - s = StateUnknown - } else { - switch r.liveness { - case heartbeatpb.NodeLiveness_DRAINING: - s = StateDraining - case heartbeatpb.NodeLiveness_STOPPING: - s = StateStopping - default: - s = StateAlive - } - } - if s == state { + if v.deriveState(r, now) == state { out = append(out, id) } } return out }

🤖 Prompt for AI Agents

In `@coordinator/nodeliveness/view.go` around lines 162 - 189, GetNodesByState duplicates the TTL + liveness → State mapping from GetState; extract that logic into a private, lock-free helper (e.g., func (v *View) deriveStateFromRecord(r *record, now time.Time) State) that reads r.everSeenHeartbeat, r.lastSeen, r.liveness and v.ttl to return the derived State, then call this helper from both GetState and GetNodesByState (use it on each r from v.data inside GetNodesByState while keeping the RLock). Ensure the helper accepts a *record and now so no locking is performed inside it and update both callers to use the new function to keep derivation rules in one place.

coderabbitai · 2026-02-11T02:56:17Z

coordinator/scheduler/drain.go

+	scheduled := 0
+	for i := 0; i < len(drainingNodes) && scheduled < s.batchSize; i++ {
+		origin := drainingNodes[(s.rrIndex+i)%len(drainingNodes)]
+		changefeeds := s.changefeedDB.GetByNodeID(origin)
+		if len(changefeeds) == 0 {
+			continue
+		}
+
+		for _, cf := range changefeeds {
+			if scheduled >= s.batchSize {
+				break
+			}
+			if s.operatorController.HasOperatorByID(cf.ID) {
+				continue
+			}
+
+			dest := pickLeastLoadedNode(destNodes, nodeTaskSize)
+			if dest == "" {
+				log.Info("no schedulable destination node for drain",
+					zap.Stringer("origin", origin))
+				return now.Add(time.Second)
+			}
+
+			if !s.operatorController.AddOperator(operator.NewMoveMaintainerOperator(s.changefeedDB, cf, origin, dest)) {
+				continue
+			}
+			nodeTaskSize[dest]++
+			scheduled++
+		}


⚠️ Potential issue | 🟡 Minor

Drain scheduler does not account for existing in-flight operators when capping batchSize.

The basic scheduler computes availableSize := batchSize - operatorController.OperatorSize() before scheduling. The drain scheduler uses s.batchSize directly, which could cause the operator controller to exceed its intended capacity when drain and regular scheduling run concurrently.

If drain should be rate-limited like other schedulers, subtract current operator count from the batch limit. If this is intentional (drain takes priority), a comment explaining the choice would help.

Proposed fix (if drain should respect capacity)

+ availableSize := s.batchSize - s.operatorController.OperatorSize() + if availableSize <= 0 { + return now.Add(drainCheckInterval) + } + scheduled := 0 - for i := 0; i < len(drainingNodes) && scheduled < s.batchSize; i++ { + for i := 0; i < len(drainingNodes) && scheduled < availableSize; i++ { origin := drainingNodes[(s.rrIndex+i)%len(drainingNodes)]

🤖 Prompt for AI Agents

In `@coordinator/scheduler/drain.go` around lines 70 - 98, The drain scheduler currently uses s.batchSize directly and can exceed controller capacity; compute availableSize := s.batchSize - s.operatorController.OperatorSize() (clamp to >=0) at the start of Schedule loop and use availableSize instead of s.batchSize when checking scheduled limits and breaking out of loops (e.g., replace checks like scheduled >= s.batchSize with scheduled >= availableSize). Keep all other logic (drainingNodes iteration, s.rrIndex, s.changefeedDB.GetByNodeID, s.operatorController.HasOperatorByID, pickLeastLoadedNode, s.operatorController.AddOperator, nodeTaskSize updates) the same so drain scheduling respects the controller capacity cap.

coderabbitai · 2026-02-11T02:56:17Z

heartbeatpb/heartbeat.proto

+// NodeLiveness is the node-level liveness state reported by a NodeAgent.
+//
+// It is used by the coordinator to:
+// - filter destination candidates for scheduling
+// - drive node drain progress
+// - avoid campaigning/residing leadership on nodes preparing to go offline
+enum NodeLiveness {
+    ALIVE = 0;
+    DRAINING = 1;
+    STOPPING = 2;
+}
+
+// NodeHeartbeat is a node-scoped heartbeat, independent of maintainer heartbeats.
+message NodeHeartbeat {
+    NodeLiveness liveness = 1;
+    uint64 node_epoch = 2;
+}
+
+// SetNodeLivenessRequest asks a node to upgrade its local liveness monotonically.
+message SetNodeLivenessRequest {
+    NodeLiveness target = 1;
+    uint64 node_epoch = 2;
+}
+
+// SetNodeLivenessResponse reports the liveness applied by the node and its current epoch.
+message SetNodeLivenessResponse {
+    NodeLiveness applied = 1;
+    uint64 node_epoch = 2;
+}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, find and examine the Go Liveness enum definition fd -e go -x grep -l "LivenessCapture" {} \; | head -5

Repository: pingcap/ticdc

Length of output: 179

🏁 Script executed:

# Get the actual enum values from pkg/api/util.go rg -n "LivenessCapture" -A 1 --type=go -g "!*test*" -g "!*.pb.go"

Repository: pingcap/ticdc

Length of output: 3560

🏁 Script executed:

# Search for conversion functions or casting between NodeLiveness and Liveness rg -n -B3 -A3 "NodeLiveness|Liveness.*=" --type=go -g "!*.pb.go" -g "!*_test.go" | head -100

Repository: pingcap/ticdc

Length of output: 5778

🏁 Script executed:

# Look for proto imports and usage patterns rg -n "heartbeatpb\." --type=go -g "!*.pb.go" | head -50

Repository: pingcap/ticdc

Length of output: 5007

🏁 Script executed:

# Get the full onSetNodeLivenessRequest function implementation sed -n '364,400p' maintainer/maintainer_manager.go

Repository: pingcap/ticdc

Length of output: 1160

🏁 Script executed:

# Search for any other conversion patterns between the two types rg -n "api\.Liveness|heartbeatpb\.NodeLiveness" --type=go -B2 -A2 maintainer/maintainer_manager.go | head -80

Repository: pingcap/ticdc

Length of output: 1806

🏁 Script executed:

# Look for the conversion function at lines 455-460 sed -n '450,465p' maintainer/maintainer_manager.go

Repository: pingcap/ticdc

Length of output: 451

🏁 Script executed:

# Search for any direct int/uint32 casting or numeric conversions with liveness rg -n "int32|uint32.*iveness|iveness.*int32|iveness.*uint32" --type=go maintainer/

Repository: pingcap/ticdc

Length of output: 1979

🏁 Script executed:

# Check for any generated proto code that might do numeric conversion fd -e pb.go -x rg -l "NodeLiveness" {} \;

Repository: pingcap/ticdc

Length of output: 86

🏁 Script executed:

# Search for any proto.Marshal or proto.Unmarshal operations that might bypass the conversion functions rg -n "Marshal|Unmarshal" --type=go maintainer/maintainer_manager.go

Repository: pingcap/ticdc

Length of output: 101

🏁 Script executed:

# Check for any proto conversion in the generated pb.go file rg -n "NodeLiveness" heartbeatpb/heartbeat.pb.go | head -30

Repository: pingcap/ticdc

Length of output: 2019

🏁 Script executed:

# Verify there are no other conversion functions or casting paths rg -n "NodeLiveness_|LivenessCapture" --type=go -g "!*test*" -g "!*.pb.go" | grep -v "maintainer_manager\|module_election\|health.go\|util.go"

Repository: pingcap/ticdc

Length of output: 693

🏁 Script executed:

# Final check: verify all uses of NodeLiveness in coordinator package rg -n "NodeLiveness_" --type=go coordinator/ | head -20

Repository: pingcap/ticdc

Length of output: 1600

Numeric value mismatch between proto NodeLiveness and Go Liveness enum should be aligned.

The proto defines DRAINING=1, STOPPING=2, but pkg/api/util.go defines LivenessCaptureDraining=2, LivenessCaptureStopping=1 — the numeric values for Draining and Stopping are swapped. While all current conversions between the two types use explicit switch statements that map by enum name (safe), the numeric mismatch is a maintenance risk that could lead to bugs if future code bypasses these conversions via unsafe casting.

Align the numeric values in both enums (recommend setting LivenessCaptureDraining=1, LivenessCaptureStopping=2 in Go to match the proto definition).

🤖 Prompt for AI Agents

In `@heartbeatpb/heartbeat.proto` around lines 129 - 157, The NodeLiveness enum in heartbeat.proto has DRAINING=1 and STOPPING=2, but the Go constants LivenessCaptureDraining and LivenessCaptureStopping in pkg/api/util.go are swapped; update the Go constants so LivenessCaptureDraining = 1 and LivenessCaptureStopping = 2 to match the proto (adjust the numeric values in the LivenessCapture... constant declarations), then run tests/linters to ensure no usages rely on the old numeric values; conversion helpers (if any) like those mapping NodeLiveness <-> Liveness should remain correct.

coderabbitai · 2026-02-11T02:56:17Z

server/module_election.go

 			if resignErr := e.resign(ctx); resignErr != nil {
 				log.Warn("resign coordinator actively failed", zap.String("nodeID", nodeID), zap.Error(resignErr))
 				return errors.Trace(err)


⚠️ Potential issue | 🟠 Major

Pre-existing bug: returning err instead of resignErr on resign failure.

Line 122 returns errors.Trace(err), but at this point err is nil (the campaign succeeded on line 95 and passed the error checks on lines 101-113). This means a resign failure is silently swallowed and the function returns nil. The same bug exists at line 266 for the log coordinator path. While this predates this PR, the surrounding code was modified and this is worth fixing.

Proposed fix

- return errors.Trace(err) + return errors.Trace(resignErr)

🤖 Prompt for AI Agents

In `@server/module_election.go` around lines 120 - 122, The resign failure is being masked by returning the wrong error variable; update both places where we call e.resign(ctx) (the coordinator resign path and the log coordinator path) to return the resignErr instead of err and ensure the error logged with log.Warn still includes resignErr (e.g., change the return from errors.Trace(err) to errors.Trace(resignErr) for the code paths around e.resign(ctx) and the corresponding log coordinator branch).

coderabbitai · 2026-02-11T02:56:17Z

server/module_election.go

 				log.Warn("resign log coordinator actively failed",
 					zap.String("nodeID", nodeID), zap.Error(resignErr))
 				return errors.Trace(err)


⚠️ Potential issue | 🟠 Major

Same pre-existing bug: returning err instead of resignErr at line 266.

Same issue as line 122 — err here is nil from logElection.Campaign(ctx, nodeID) succeeding, so the resign error is lost.

Proposed fix

- return errors.Trace(err) + return errors.Trace(resignErr)

🤖 Prompt for AI Agents

In `@server/module_election.go` around lines 264 - 266, The resign error returned from the active resignation path is being discarded because the code returns errors.Trace(err) (where err is nil after a successful Campaign) instead of the actual resignErr; update the return in the resignation failure branch in module_election.go to return errors.Trace(resignErr) and ensure the warning log still uses zap.Error(resignErr) (refer to resignErr and the log.Warn call near the active resign branch in the election handling code).

hongyunyan added 3 commits February 10, 2026 18:17

coordinator: implement node drain (drain capture)

92699a8

coordinator,api: fix node liveness view locking

8864494

coordinator: avoid fixed ports in tests

145ac98

ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Feb 11, 2026

ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 11, 2026

gemini-code-assist bot reviewed Feb 11, 2026

View reviewed changes

coderabbitai bot reviewed Feb 11, 2026

View reviewed changes

wip-drain capture-codex potter #4188

Are you sure you want to change the base?

wip-drain capture-codex potter #4188

Uh oh!

Conversation

hongyunyan commented Feb 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how it works?

Check List

Tests

Questions

Will it cause performance regression or break compatibility?

Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Summary by CodeRabbit

Release Notes

Uh oh!

ti-chi-bot bot commented Feb 11, 2026

Uh oh!

coderabbitai bot commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

gemini-code-assist bot commented Feb 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

ti-chi-bot bot commented Feb 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

hongyunyan commented Feb 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 11, 2026 •

edited

Loading