Skip to content

Conversation

@anton-vinogradov
Copy link
Contributor

Thank you for submitting the pull request to the Apache Ignite.

In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:

The Contribution Checklist

  • There is a single JIRA ticket related to the pull request.
  • The web-link to the pull request is attached to the JIRA ticket.
  • The JIRA ticket has the Patch Available state.
  • The pull request body describes changes that have been made.
    The description explains WHAT and WHY was made instead of HOW.
  • The pull request title is treated as the final commit message.
    The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • A reviewer has been mentioned through the JIRA comments
    (see the Maintainers list)
  • The pull request has been checked by the Teamcity Bot and
    the green visa attached to the JIRA ticket (see TC.Bot: Check PR)

Notes

If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.


Iterator<TcpDiscoveryNode> iter = filtered.iterator();
NavigableSet<TcpDiscoveryNode> sorted = new TreeSet<>(new MdcAwareNodesComparator());
sorted.addAll(filtered);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if TreeSet always preserves order of added elements if all elements are equal according to comparator, so it potentianly can break something in non-multi-DC cluster. Let's use sorting only for multi-DC environment. Also it's a potentially dangerous feature even for multi-DC, I think it's worth to have a flag to disable it explicitely.

Copy link
Contributor Author

@anton-vinogradov anton-vinogradov Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MdcAwareNodesComparator guarantee elements to be non-equal since it compares dcId and if they are equal it compares nodes as usual.

if (res == 0)
   res = n1.compareTo(n2);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that having a switch to turn this feature off and fall back to a manual management of ring structure is necessary.

There is always a chance that we miss an edge case during testing and a critical bug in discovery makes it to production. Having a flag as a way to get back to a suboptimal but working implementation is critical in such situation.

Copy link
Contributor Author

@anton-vinogradov anton-vinogradov Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed with

private static final boolean mdcAwareRing = IgniteSystemProperties.getBoolean("MDC_AWARE_RING", true);
Collection<TcpDiscoveryNode> sorted;

if (mdcAwareRing) {
    sorted = new TreeSet<>(new MdcAwareNodesComparator());
    sorted.addAll(nodes);
}
else
    sorted = nodes;

Comment on lines 26 to 27
String n1DcId = n1.dataCenterId() == null ? "" : n1.dataCenterId();
String n2DcId = n2.dataCenterId() == null ? "" : n2.dataCenterId();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multi-DC environment we have a check, that all dataCenterId is provided. We can ommit nullability check if comparator will be used only for multi-DC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the "" defaults.

void add(TcpDiscoveryAbstractMessage msg) {
msgs.add(new PendingMessage(msg));

while (msgs.size() > MAX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see, the old implementation have this logic to maintain internal queue size and not let it get close to MAX * 2 size right on add operation.

Now there is no such logic, we clean up internal collection on discard call itself.

Do we understand the logic behing this old behavior? What are situations when this logic starts working? Does new implementation perform better or worse in these situations?

This whole discard logic looks very important to me, and I want to make sure that this change doesn't break some corner case with lagging network which is not covered with our test base.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old sollution (queue limiting) seems to be a buggy. In case of a queue overfill we just lose the consistency.
New solution guarantee the consistency and cleans the elements only when it's safe to clean.
Not sure we have deployments for 1K+ nodes where this will NOT happen now, but code becomes more stable, I think.

anton-vinogradov and others added 6 commits January 23, 2026 17:46
…/MultiDataCenterRignTest.java

Co-authored-by: Sergey Chugunov <sergey.chugunov@gmail.com>
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
5 New Code Smells (required ≤ 1)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants