-
Notifications
You must be signed in to change notification settings - Fork 245
Dense zero trie Out of Range handling #7303 #7305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… row values to go in dense matrix
… row values to go in dense matrix
sffc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Some suggestions
utils/zerotrie/tests/dense_test.rs
Outdated
| let dense_size = check_encoding(trie); | ||
| let simple_size = make_simple_ascii_trie(&data).byte_len(); | ||
|
|
||
| println!("Dense size: {}, Simple size: {}", dense_size, simple_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add assertions on the two sizes. We assert the sizes because they should be stable, and because you can see what they are from reading the code. They should only change if we change the data structure or builder algorithm.
|
@sffc I see that two tests are failing right now, but I'm not really sure how to recreate that on my side - is there something I can run or look at or is this related to me? And is there anything else I should do? |
|
The branch might be stale. I updated it to main |
|
Ah awesome, thanks! Let me know if you need anything else to review. |
sffc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your work and sorry for the slow follow-up!
| let top_val = sorted_vals.get(top).copied().unwrap_or(0); | ||
| let bot_val = sorted_vals.get(bot).copied().unwrap_or(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Here and everywhere else you changed to use .unwrap_or(0):
It is easy to see that both top and bot are < sorted_vales.len(), so typically our style is to use the index operator and add a suppression with comment like
#[allow(clippy::indexing_slicing)] // bot < top and top < sorted_vals.len()
We do the .unwrap_or style in GIGO situations, where the input could be invalid, but in this case it is an easily provable code constraint.
|
|
||
| let mut inner = BTreeMap::new(); | ||
| inner.insert("low", far_low); | ||
| inner.insert("a", cluster_vals.first().copied().unwrap_or(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please inline the cluster_vals values to make it more consistent with the rest of the map insertions.
Working on #7303.
When creating a dense sparse zerotrie, if the incoming values are too spread out for a dense matrix:
Also, added a test case for this. I am not certain that this test case is sufficient, this is my first contribution here.