Write dict_id in zlib header in big-endian style.#207
Write dict_id in zlib header in big-endian style.#207ZhaiMo15 wants to merge 1 commit intointel:masterfrom
Conversation
|
Looks like it should be written as bit-endian. However, how about int isal_read_zlib_header(struct inflate_state *state, struct isal_zlib_header *zlib_hdr)
{
...
if (zlib_hdr->dict_flag) {
case ISAL_ZLIB_DICT:
ret = fixed_size_read(state, &next_in, ZLIB_DICT_LEN);
...The |
|
I don't have a big-endian machine, I can't prove whether it works or not. However, zlib trailer is the same as zlib dict, if zlib trailer worked fine on a big-endian machine, I think zlib dict would work as well. |
|
Does ISA-L support any big-endian machines? x86_64 is little-endian and aarch64 is mostly little-endian, if I remember correctly. |
Good point but the trailer is read as
@iii-i did a lot of work to ensure s390, big-endian was correct but we don't have one in the test pool. |
|
I still think it's the same.
|
|
You can get a big-endian machine for testing here: https://www.ibm.com/community/z/linuxone-cc/login-vm/ |
33f4d02 to
a5266f2
Compare
Signed-off-by: Mo Zhai <zhaimo14@mails.ucas.ac.cn>
|
Is anything blocking this? This seems like a major bug fix (current behavior not in line with zlib format specification). |
|
@rhpvorderman, I don't think this one is correct. I haven't seen any confirmation that this is an actual bug and this fix may introduce a bug if it is currently correct. As I understood it, @MoZhai15 added this from observation in an unrelated issue without any BE testing.
Also this line was changed explicitly by Ilya to store_le_u32() in the port to BE as were all the header writes. There is currently an issue with the igzip_wrapper_hdr_test but if you skip these it reports So I would label this as waiting on test and confirmation. |
|
I added some tests to python-isal. (https://github.com/pycompression/python-isal/blob/a6c2dcff4e2386d7c9b5a6406ccf399ad3a4d3de/tests/test_compat.py#L170)
The later test succeeds. So it looks like ISA-L writes the dict correctly. The former test fails however. So ISA-L apparently cannot parse the zlib dict correctly? Or something goes wrong in the python wrapper code. That is also a possibility. |
|
Hi @rhpvorderman. Is this still an issue? Thanks! |
|
I still believe so, but it's been a long time, I cannot find my demo.
And after this patch, it works fine. Consider of zlib trailer, write_trailer in BE read_trailer in LE Then zlib read dict in LE. zlib write dict in LE. My point is trailer and dict are similar, so the read dict should be BE. |
The test cases in python-isal work. So not an issue anymore. |
|
Thanks! Actually, with this patch, the "igzip_wrapper_hdr_test" unit test fails, so unless the read has to be in big endian, this will break the tests. Since python-isal works, it looks like the header produced by isa-l is the same as the one produced by zlib? |
|
Looks like it. Python-isal tests compatibility against zlib. Maybe the best course of action is to keep the code as is until someone files a bug report about wrong headers? |
Thanks. Looking into those tests, it looks like the header is not actually read, so it won't matter how the dict_id was written, the dictionary is passed directly to both interfaces. Am I right? |
|
Yes, but the header is also parsed by zlib (the zstream is initiated with wbits=15). I don't know if it ignores the dict id if inflateSetDict or deflateSetDict functions have been used beforehand. I am not that knowledgeable on zlib internals. |
|
@rhpvorderman It might be ignoring the header, because I checked and isal is not writing the dict id. As Greg Tucker pointed out here #206 (comment), this is expected and isal_write_zlib_header should be called. |
|
@pablodelara Ah, ok. Well in that case python-isal might do something wrong. It is not a common use case at all for python-isal, so the issue is certainly not a blocker for me. I also do not have time to investigate this further on short notice. Sorry for that. |
|
Thanks for the input. Let's defer this for the next release. It is not clear there is something wrong, as RFC1950 doesn't state the endianness of this dict_id field. |
To use a preset dictionary, a dict_id should be written in zlib header. That information should be written in big-endian style but not little-endian.
Mo Zhai zhaimo14@mails.ucas.ac.cn