Skip to content

Conversation

@jcongithub
Copy link

This PR fixes the miscalculated maxWait in MoveWithAck using current time instead of sartTime

In MoveWithAck, the maxWait timeout was calculated inside the retry loop using a fixed startTime calculated at the out side of the retry loop, causing it to remain constant across all retry attempts. This means that each retry didn't really check isSameServer and marks all retries failed.

@Apache-HBase

This comment has been minimized.

@Apache-HBase

This comment has been minimized.

@mnpoonia
Copy link
Contributor

mnpoonia commented Nov 30, 2025

It is bit confusing. There are two situations. Whether we want to wait for max move time fro entire region mover or per move. Last code was considering the entire move operations and your code change it to per move. Now i am not sure what is the real intention out of two. Have to check the code once.

@jcongithub
Copy link
Author

@mnpoonia Thanks for the review. In the current code, the wait loop is inside the retry loop, so maxWait should be per-retry. If maxWait were a single absolute deadline for all retries, once the first retry waits up to that deadline, subsequent retries would skip the wait loop because the current time already exceeds the deadline. As a result, isSameServer(...) would not be invoked again, sameServer would never update after the first retry, and the move would fail regardless of later attempts.

@Apache9
Copy link
Contributor

Apache9 commented Dec 5, 2025

I think the logic after the PR is more reasonable, although we lose the ability to limit the whole retry time with a single configuration, but you can still control it with maxRetries * maxWait.

So I'm +1 on the change.

Let me request review for some other committer to see if there are other opinion.

Thanks.

@Apache9 Apache9 changed the title Fix miscalculated maxWait timeout in MoveWithAck HBASE-29734 Fix miscalculated maxWait timeout in MoveWithAck Dec 5, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a bug in the MoveWithAck retry logic where the maxWait timeout was incorrectly calculated using a fixed startTime value, causing subsequent retry attempts to have expired timeouts immediately. The fix ensures each retry gets a fresh timeout by calculating maxWait using the current time instead.

Key Changes

  • Updated maxWait calculation in the retry loop to use current time instead of the fixed startTime, ensuring each retry attempt gets the full configured wait duration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@droudnitsky
Copy link
Contributor

I am +1 , it looks like previous retry logic didn't work

you can still control it with maxRetries * maxWait

👍

@jcongithub
Copy link
Author

Thanks all for reviewing the PR. I have raised a related PR for a change in RegionMover to fix timeoutInSeconds calculation to consider retry attempts in MoveWithAck

@stoty
Copy link
Contributor

stoty commented Dec 8, 2025

+1 LGTM

I agree that the current code does not make sense.

This does change the semantics of the timeout. Maybe merge this with #7519 ?
I think that a release note should also be added.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ docker 0m 4s Docker failed to build run-specific yetus/hbase:tp-6897}.
Subsystem Report/Notes
GITHUB PR #7488
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/2/console
versions git=2.17.1
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 44s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ master Compile Tests _
+1 💚 mvninstall 5m 18s master passed
+1 💚 compile 1m 16s master passed
+1 💚 javadoc 0m 40s master passed
+1 💚 shadedjars 6m 58s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 12s the patch passed
+1 💚 compile 1m 20s the patch passed
+1 💚 javac 1m 20s the patch passed
+1 💚 javadoc 0m 42s the patch passed
+1 💚 shadedjars 6m 48s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 17m 14s /patch-unit-hbase-server.txt hbase-server in the patch failed.
47m 31s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/2/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #7488
Optional Tests javac javadoc unit compile shadedjars
uname Linux d0e59812cdbe 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision master / 6531542
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/2/testReport/
Max. process+thread count 1111 (vs. ulimit of 30000)
modules C: hbase-server U: hbase-server
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-7488/2/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants