Skip to content

Conversation

@Cdog1526
Copy link

Working on #7303.

When creating a dense sparse zerotrie, if the incoming values are too spread out for a dense matrix:

  • Select the row offset that gets the greatest number of values into the dense matrix.
  • Put all out-of-range values into the primary trie, as we do with sparse rows.

Also, added a test case for this. I am not certain that this test case is sufficient, this is my first contribution here.

@Cdog1526 Cdog1526 requested a review from a team as a code owner December 11, 2025 07:39
@CLAassistant
Copy link

CLAassistant commented Dec 11, 2025

CLA assistant check
All committers have signed the CLA.

@Manishearth Manishearth requested a review from sffc December 11, 2025 17:19
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Some suggestions

let dense_size = check_encoding(trie);
let simple_size = make_simple_ascii_trie(&data).byte_len();

println!("Dense size: {}, Simple size: {}", dense_size, simple_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add assertions on the two sizes. We assert the sizes because they should be stable, and because you can see what they are from reading the code. They should only change if we change the data structure or builder algorithm.

@sffc sffc self-requested a review December 12, 2025 07:05
@Cdog1526
Copy link
Author

@sffc I see that two tests are failing right now, but I'm not really sure how to recreate that on my side - is there something I can run or look at or is this related to me? And is there anything else I should do?

@sffc
Copy link
Member

sffc commented Dec 21, 2025

The branch might be stale. I updated it to main

@Cdog1526
Copy link
Author

Ah awesome, thanks! Let me know if you need anything else to review.

Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work and sorry for the slow follow-up!

Comment on lines +41 to +42
let top_val = sorted_vals.get(top).copied().unwrap_or(0);
let bot_val = sorted_vals.get(bot).copied().unwrap_or(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Here and everywhere else you changed to use .unwrap_or(0):

It is easy to see that both top and bot are < sorted_vales.len(), so typically our style is to use the index operator and add a suppression with comment like

#[allow(clippy::indexing_slicing)] // bot < top and top < sorted_vals.len()

We do the .unwrap_or style in GIGO situations, where the input could be invalid, but in this case it is an easily provable code constraint.


let mut inner = BTreeMap::new();
inner.insert("low", far_low);
inner.insert("a", cluster_vals.first().copied().unwrap_or(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please inline the cluster_vals values to make it more consistent with the rest of the map insertions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants