IGNITE-26584 Discovery optimizations for MultiDC #12517

anton-vinogradov · 2025-11-18T14:42:00Z

Thank you for submitting the pull request to the Apache Ignite.

In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:

The Contribution Checklist

There is a single JIRA ticket related to the pull request.
The web-link to the pull request is attached to the JIRA ticket.
The JIRA ticket has the Patch Available state.
The pull request body describes changes that have been made.
The description explains WHAT and WHY was made instead of HOW.
The pull request title is treated as the final commit message.
The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
A reviewer has been mentioned through the JIRA comments
(see the Maintainers list)
The pull request has been checked by the Teamcity Bot and
the green visa attached to the JIRA ticket (see TC.Bot: Check PR)

Notes

If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.

# Conflicts: # modules/core/src/test/java/org/apache/ignite/testsuites/IgniteSpiDiscoverySelfTestSuite.java

...core/src/main/java/org/apache/ignite/spi/discovery/tcp/internal/MdcAwareNodesComparator.java

alex-plekhanov · 2026-01-23T05:41:55Z

...s/core/src/main/java/org/apache/ignite/spi/discovery/tcp/internal/TcpDiscoveryNodesRing.java


-            Iterator<TcpDiscoveryNode> iter = filtered.iterator();
+            NavigableSet<TcpDiscoveryNode> sorted = new TreeSet<>(new MdcAwareNodesComparator());
+            sorted.addAll(filtered);


I'm not sure if TreeSet always preserves order of added elements if all elements are equal according to comparator, so it potentianly can break something in non-multi-DC cluster. Let's use sorting only for multi-DC environment. Also it's a potentially dangerous feature even for multi-DC, I think it's worth to have a flag to disable it explicitely.

MdcAwareNodesComparator guarantee elements to be non-equal since it compares dcId and if they are equal it compares nodes as usual.

if (res == 0) res = n1.compareTo(n2);

I agree that having a switch to turn this feature off and fall back to a manual management of ring structure is necessary.

There is always a chance that we miss an edge case during testing and a critical bug in discovery makes it to production. Having a flag as a way to get back to a suboptimal but working implementation is critical in such situation.

Fixed with

private static final boolean mdcAwareRing = IgniteSystemProperties.getBoolean("MDC_AWARE_RING", true);

Collection<TcpDiscoveryNode> sorted; if (mdcAwareRing) { sorted = new TreeSet<>(new MdcAwareNodesComparator()); sorted.addAll(nodes); } else sorted = nodes;

...s/core/src/main/java/org/apache/ignite/spi/discovery/tcp/internal/TcpDiscoveryNodesRing.java

alex-plekhanov · 2026-01-23T05:48:59Z

...core/src/main/java/org/apache/ignite/spi/discovery/tcp/internal/MdcAwareNodesComparator.java

+        String n1DcId = n1.dataCenterId() == null ? "" : n1.dataCenterId();
+        String n2DcId = n2.dataCenterId() == null ? "" : n2.dataCenterId();


For multi-DC environment we have a check, that all dataCenterId is provided. We can ommit nullability check if comparator will be used only for multi-DC.

Removed the "" defaults.

modules/core/src/test/java/org/apache/ignite/spi/discovery/tcp/MultiDataCenterRignTest.java

sergey-chugunov-1985 · 2026-01-23T14:42:16Z

modules/core/src/main/java/org/apache/ignite/spi/discovery/tcp/ServerImpl.java

        void add(TcpDiscoveryAbstractMessage msg) {
            msgs.add(new PendingMessage(msg));
-
-            while (msgs.size() > MAX) {


As far as I can see, the old implementation have this logic to maintain internal queue size and not let it get close to MAX * 2 size right on add operation.

Now there is no such logic, we clean up internal collection on discard call itself.

Do we understand the logic behing this old behavior? What are situations when this logic starts working? Does new implementation perform better or worse in these situations?

This whole discard logic looks very important to me, and I want to make sure that this change doesn't break some corner case with lagging network which is not covered with our test base.

Old sollution (queue limiting) seems to be a buggy. In case of a queue overfill we just lose the consistency.
New solution guarantee the consistency and cleans the elements only when it's safe to clean.
Not sure we have deployments for 1K+ nodes where this will NOT happen now, but code becomes more stable, I think.

…/MultiDataCenterRignTest.java Co-authored-by: Sergey Chugunov <sergey.chugunov@gmail.com>

sonarqubecloud · 2026-01-23T16:04:28Z

Quality Gate failed

Failed conditions
5 New Code Smells (required ≤ 1)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

anton-vinogradov added 30 commits November 18, 2025 17:05

WIP

31da616

WIP

9492376

WIP

a52a71c

WIP

bf4fc47

WIP

b2b9092

WIP

e33f606

WIP

d367510

WIP

dbc5475

WIP

509cf23

WIP

8a3cd3d

WIP

abc7b3e

WIP

eb4e7be

Merge remote-tracking branch 'origin/master' into ignite-26584

b0519c3

# Conflicts: # modules/core/src/test/java/org/apache/ignite/testsuites/IgniteSpiDiscoverySelfTestSuite.java

WIP

04710f3

WIP

5c2817e

WIP

e16f6e8

WIP

85f67bf

WIP

2c249a0

WIP

1c72889

WIP

e0441ad

WIP

0189213

WIP

2a1810e

Merge remote-tracking branch 'origin/master' into ignite-26584

c5a6047

WIP

8f6f028

Merge remote-tracking branch 'origin/master' into ignite-26584

211a52b

WIP

8fb697a

WIP

790ecd5

WIP

d835543

WIP

c977447

WIP

772c3d3

anton-vinogradov added 18 commits December 12, 2025 23:53

WIP

ebed582

WIP

776f906

WIP

af4bf93

WIP

3860481

WIP

67f6311

WIP

ac91b59

WIP

18ce7ae

WIP

3d7c702

WIP

c7e1ce3

WIP

429a37b

Merge remote-tracking branch 'origin/master' into ignite-26584

2387cef

WIP

52caab4

WIP

19e5b24

WIP

bacb350

WIP

042d7f7

WIP

8b39a92

WIP

9ab80cb

WIP

4763603

alex-plekhanov reviewed Jan 23, 2026

View reviewed changes

anton-vinogradov added 2 commits January 23, 2026 16:15

WIP

3327722

WIP

36fb5cb

sergey-chugunov-1985 reviewed Jan 23, 2026

View reviewed changes

modules/core/src/test/java/org/apache/ignite/spi/discovery/tcp/MultiDataCenterRignTest.java Outdated Show resolved Hide resolved

sergey-chugunov-1985 reviewed Jan 23, 2026

View reviewed changes

anton-vinogradov and others added 6 commits January 23, 2026 17:46

Update modules/core/src/test/java/org/apache/ignite/spi/discovery/tcp…

1f0d4e4

…/MultiDataCenterRignTest.java Co-authored-by: Sergey Chugunov <sergey.chugunov@gmail.com>

WIP

17a299e

WIP

b0e95cf

WIP

665da3f

WIP

f037c58

WIP

fb8b8bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IGNITE-26584 Discovery optimizations for MultiDC #12517

IGNITE-26584 Discovery optimizations for MultiDC #12517

Uh oh!

anton-vinogradov commented Nov 18, 2025

Uh oh!

Uh oh!

alex-plekhanov Jan 23, 2026

Uh oh!

anton-vinogradov Jan 23, 2026 •

edited

Loading

Uh oh!

sergey-chugunov-1985 Jan 23, 2026

Uh oh!

anton-vinogradov Jan 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

alex-plekhanov Jan 23, 2026

Uh oh!

anton-vinogradov Jan 23, 2026

Uh oh!

Uh oh!

sergey-chugunov-1985 Jan 23, 2026

Uh oh!

anton-vinogradov Jan 23, 2026

Uh oh!

sonarqubecloud bot commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		String n1DcId = n1.dataCenterId() == null ? "" : n1.dataCenterId();
		String n2DcId = n2.dataCenterId() == null ? "" : n2.dataCenterId();

IGNITE-26584 Discovery optimizations for MultiDC #12517

Are you sure you want to change the base?

IGNITE-26584 Discovery optimizations for MultiDC #12517

Uh oh!

Conversation

anton-vinogradov commented Nov 18, 2025

The Contribution Checklist

Notes

Uh oh!

Uh oh!

alex-plekhanov Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

anton-vinogradov Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sergey-chugunov-1985 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

anton-vinogradov Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alex-plekhanov Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

anton-vinogradov Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sergey-chugunov-1985 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

anton-vinogradov Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Jan 23, 2026

Quality Gate failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anton-vinogradov Jan 23, 2026 •

edited

Loading

anton-vinogradov Jan 23, 2026 •

edited

Loading