Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 7, 2025

Several functions were unnecessarily restricted to C-based zlib backends despite working with pure Rust backends like zlib-rs.

Changes

Changed feature gate from any_zlib to any_impl for:

  • Compress::new_with_window_bits() - Custom window size for compression
  • Compress::new_gzip() - Gzip-formatted compression
  • Decompress::new_with_window_bits() - Custom window size for decompression
  • Decompress::new_gzip() - Gzip-formatted decompression

These functions work across all backends because:

  • Backend trait make() methods already accept window_bits parameter
  • Gzip mode uses standard window_bits + 16 convention supported universally

Dictionary functions (set_dictionary, set_level) remain any_zlib-only as they require direct stream manipulation not exposed in zlib-rs stable API.

Tests

Added validation for newly enabled functions:

  • Window size variations (9, 12, 15)
  • Boundary validation (panics on invalid values)
  • Round-trip compression/decompression
// Now works with zlib-rs backend
let mut encoder = Compress::new_gzip(Compression::default(), 9);
encoder.compress_vec(data, &mut output, FlushCompress::Finish)?;

let mut decoder = Decompress::new_gzip(9);
decoder.decompress(&output, &mut decoded, FlushDecompress::Finish)?;
Original prompt

#517 See how the functions that rely on #[cfg(feature = "any_zlib")] can also be implemented when zlib-rs is set, which doesn't use the C FFI anymore. Add tests that validate that these functions are present with zlib-rs.
If one of these can't be implemented with zlib-rs, that's fine too.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Add tests for functions with zlib-rs support Enable window_bits and gzip functions for zlib-rs backend Dec 7, 2025
Copilot AI requested a review from Byron December 7, 2025 08:23
Copilot AI and others added 2 commits December 7, 2025 09:37
Co-authored-by: Byron <63622+Byron@users.noreply.github.com>
- cargo fmt
@Byron Byron force-pushed the copilot/add-tests-for-zlib-rs-functionality branch from 8f2cdbb to b4993c4 Compare December 7, 2025 08:38
@Byron Byron force-pushed the copilot/add-tests-for-zlib-rs-functionality branch from 977d847 to d8a67e7 Compare December 7, 2025 08:53
@Byron Byron linked an issue Dec 7, 2025 that may be closed by this pull request
@Byron
Copy link
Member

Byron commented Dec 7, 2025

@folkertdev I'd need your help here 🙏.

I think this PR shows that set_dictionary and set_level currently aren't supported for zlib-rs because they need more access. Is this something you think zlib-rs will support (or does already?) so we can merge this PR and have a release that isn't considered breaking?
Thanks again for your help.

@folkertdev
Copy link
Contributor

I would have expected this PR to break CI given that zlib-rs does not currently expose the API to implement those features?

I'll need to make a quick release exposing some additional things. We have the functionality, just not the API.

@Byron
Copy link
Member

Byron commented Dec 9, 2025

I would have expected this PR to break CI given that zlib-rs does not currently expose the API to implement those features?

Actually the test doesn't check for set_dictionary and set_level for that reason, but everything else is exposed.

Once we have access to these via the Rust API, the test can be adjusted to also validate them.

In theory, it could then be copy-pasted for each of the other implementations.

In any case, thanks for your help, it's much appreciated!

@folkertdev
Copy link
Contributor

folkertdev commented Dec 9, 2025

Some fun kind of undocumented behavior: the native zlib backends count the dictionary length towards total_in, but the flate2 bookkeeping does not (it only counts input to compress and decompress). Is that deviation deliberate?

@Byron
Copy link
Member

Byron commented Dec 9, 2025

Maybe @jongiddy has more context, as this question is way over my head.

Knowing that some models as built into the GH chat window have access to a lot of context and can do some thinking/planning, I thought maybe sharing their output here can be useful. Maybe not, I really can't even decide that 😅.

GPT 4.1 Mini

Short answer: probably a deliberate bookkeeping choice in flate2 rather than a spec bug in zlib. zlib's C implementation treats the preset dictionary as part of the stream history and increments its internal total_in/total_out counters to reflect that the stream now "contains" those bytes; flate2's counters reflect only the bytes that were actually passed to the compress()/decompress() calls by the caller (i.e., explicit input), and so do not add in the implicit dictionary length.

Why that difference exists (summary)

  • zlib C API semantics: when you give zlib a preset dictionary (deflateSetDictionary / inflateSetDictionary) the library internally takes that dictionary into account as stream history; the library’s total_in may be updated to include the dictionary length because the library views those bytes as part of the stream state.
  • flate2 bookkeeping: flate2 exposes total_in/total_out that count bytes passed through its API, which excludes any bytes that were not actually passed to compress()/decompress() (so a dictionary you hand in separately is excluded).
  • Result: two backends that both produce correct compressed output may still report different total_in values because they're answering slightly different questions about what "total input" means.

Is it deliberate?

  • It’s plausible and reasonable that flate2 authors intentionally count only the explicit API input (that keeps the meaning clear: “how many bytes did you hand me in these calls?”).
  • It could also be an implementation divergence (or an oversight) between backends — the important point is that the semantics are not identical across backends today.

What I recommend

  1. If you need consistent semantics in your code, treat flate2’s total_in as “bytes passed to the API” and, when you need parity with zlib’s native behavior, add dict.len() yourself to the reported total_in when you supplied a preset dictionary to the native backend.
  2. If you think flate2 should document or change this behavior (or at least make it consistent across backends), open an issue on the flate2 repository with a minimal repro and the exact backend/feature flags you used (native zlib vs pure-Rust backend).
  3. When opening the issue, include:
    • The flate2 version and the backend feature you used (native zlib vs miniz_oxide / pure Rust).
    • A tiny reproducible program: compress or decompress with an explicit preset dictionary, and print total_in from both backends so maintainers can reproduce the discrepancy.
    • The exact numbers you observed and what you expected.
    • Why you care about total_in including/excluding the dictionary (e.g., for progress reporting, accounting, or compatibility with other tools).

Suggested issue title/body (you can paste/edit this):

Title:
total_in behavior differs from system zlib when using a preset dictionary

Body:
Summary:
When using a preset dictionary, the system zlib backend increases zlib's total_in by the dictionary length, but flate2's bookkeeping (the value returned from total_in()) does not include the dictionary length. This makes total_in differ across backends.

Repro:

  • flate2 version:
  • Backends tested: system zlib (feature X), pure-Rust backend (feature Y)
  • Minimal steps: (1) set a preset dictionary, (2) call compress/decompress on some data, (3) read total_in().

Observed:

  • system zlib backend: total_in == bytes_passed + dict.len()
  • flate2 bookkeeping: total_in == bytes_passed

Expected / question:
Should flate2:

  • document that total_in excludes preset dictionary length, or
  • change behavior to include the dictionary length for parity with system zlib, or
  • expose both values (explicit input vs stream-history-inclusive total)?

Workaround:
Add dict.len() to total_in yourself when you need the dictionary counted.

If you’d like, I can help you draft a minimal reproducible Rust snippet to include in the issue (or a short test) showing the differing totals for the two backends. Which backend(s) and flate2 version are you testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1.1.7 have breaking change about feature zlib-rs

3 participants