Skip to content

Conversation

@mstoeckl
Copy link
Contributor

@mstoeckl mstoeckl commented Dec 3, 2025

  • Added a CHANGELOG.md entry

Summary

The Zipf distribution sampler could panic (at debug_assert!(t > F::zero());) when given invalid parameters (like s = inf); or hang in an infinite loop when sampling. This change rejects parameter combinations for which the generalized Zipf distribution is not well defined, and slightly modifies Zipf::new() so that sampling avoids the debug assert and produces the correct value when s = inf.

Motivation

This another simple panic / hang issue that I found by fuzzing input parameters.

Details

The normalization constant (sum from i=1 to n of 1/i^s) of the generalized Zipf distribution needs to be positive and finite for the distribution to be well defined. if s <= 1 and n = inf, then the sum in the constant diverges.

If s = inf, then each term of the constant is 0, so the specific formula in Wikipedia will not work; however, the Zipf distribution approaches the distribution concentrated on i=1 as s -> inf. The previous implementation computed a nan in Zipf::new which triggered the debug assertion (or an infinite loop), but a small change avoids both.

(Edit: originally this PR rejected the case s = inf, but after posting it I changed my mind; in general, if a limiting distribution exists and is close enough to the distribution at the largest finite parameters, then producing that limiting distribution is safe and avoids having the library user write a special case to produce effectively that limiting distribution.)

@mstoeckl mstoeckl marked this pull request as draft December 3, 2025 01:46
@mstoeckl mstoeckl changed the title Avoid panics on invalid parameters for Zipf Avoid hangs and debug asserts on invalid parameters for Zipf Dec 3, 2025
@mstoeckl mstoeckl marked this pull request as ready for review December 3, 2025 02:07
@mstoeckl mstoeckl force-pushed the zipf-edge-case branch 2 times, most recently from ec0bf49 to 227e0dd Compare December 3, 2025 11:05
@dhardy
Copy link
Member

dhardy commented Dec 4, 2025

I think inv_cdf also needs a special case for s = inf? Perhaps not if it's safe to assume that p <= 1. Maybe that should be a debug_assert.

@mstoeckl
Copy link
Contributor Author

mstoeckl commented Dec 4, 2025

I think inv_cdf also needs a special case for s = inf? Perhaps not if it's safe to assume that p <= 1. Maybe that should be a debug_assert.

It should be OK. The range of possible inputs for p in inv_cdf should be [0,1]; anything outside this is not a probability value. Zipf::sample is the only caller and provides it a sample from StandardUniform (in [0,1)).

@mstoeckl mstoeckl force-pushed the zipf-edge-case branch 2 times, most recently from 9b7a6e4 to 924ee37 Compare December 9, 2025 11:06
The generalized Zipf distribution is not well defined when its
normalization constant is infinity (when n is inf and s is <= 1).

Also, when s is inf, there is a reasonable limiting distribution,
which can be produced with a small special case in Zipf::new.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants