Avoid hangs and debug asserts on invalid parameters for Zipf #41
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CHANGELOG.mdentrySummary
The Zipf distribution sampler could panic (at
debug_assert!(t > F::zero());) when given invalid parameters (likes = inf); or hang in an infinite loop when sampling. This change rejects parameter combinations for which the generalized Zipf distribution is not well defined, and slightly modifies Zipf::new() so that sampling avoids the debug assert and produces the correct value whens = inf.Motivation
This another simple panic / hang issue that I found by fuzzing input parameters.
Details
The normalization constant (sum from i=1 to n of 1/i^s) of the generalized Zipf distribution needs to be positive and finite for the distribution to be well defined. if
s <= 1andn = inf, then the sum in the constant diverges.If
s = inf, then each term of the constant is 0, so the specific formula in Wikipedia will not work; however, the Zipf distribution approaches the distribution concentrated oni=1ass -> inf. The previous implementation computed ananin Zipf::new which triggered the debug assertion (or an infinite loop), but a small change avoids both.(Edit: originally this PR rejected the case
s = inf, but after posting it I changed my mind; in general, if a limiting distribution exists and is close enough to the distribution at the largest finite parameters, then producing that limiting distribution is safe and avoids having the library user write a special case to produce effectively that limiting distribution.)