Parallel compression for LZ4 by ParaN3xus · Pull Request #158 · nesrak1/AssetsTools.NET

ParaN3xus · 2026-02-21T03:04:35Z

This PR introduces a parallel compression path for LZ4/LZ4Fast in bundle packing. To preserve compatibility with net35 targets, this parallel path is only enabled on non-net35 builds, while net35 continues using the original sequential implementation.

The mechanism is straightforward: it takes a batch of consecutive blocks and compresses them concurrently with Parallel.For. The batch size is configurable through Lz4ParallelPackBatchSize, with a default value of 32. This default was selected based on benchmarking on my local laptop (Intel(R) Core(TM) i7-13650HX, 32G DDR5 mem, Linux 6.6.87.2-microsoft-standard-WSL2). In the benchmarking, I used a byte-counting stream to eliminate the impact of disk I/O.

Benchmarking logs are

Data length: 1,142,981,560 bytes
CPU: 20 logical cores

=== LZ4Fast ===
Warmup: 1, Measured: 3
batch=   1 | avg=  3069.20 ms | min=  2996.73 ms | out= 699,313,746 bytes
batch=   2 | avg=  1949.53 ms | min=  1925.55 ms | out= 699,313,746 bytes
batch=   4 | avg=  1374.78 ms | min=  1360.78 ms | out= 699,313,746 bytes
batch=   8 | avg=  1235.22 ms | min=  1231.19 ms | out= 699,313,746 bytes
batch=  16 | avg=  1274.38 ms | min=  1255.26 ms | out= 699,313,746 bytes
batch=  32 | avg=  1000.09 ms | min=   981.35 ms | out= 699,313,746 bytes
batch=  64 | avg=  1015.87 ms | min=   971.02 ms | out= 699,313,746 bytes
batch=  96 | avg=   979.19 ms | min=   963.95 ms | out= 699,313,746 bytes
batch= 128 | avg=  1018.69 ms | min=  1016.70 ms | out= 699,313,746 bytes
batch= 192 | avg=  1127.88 ms | min=  1005.42 ms | out= 699,313,746 bytes
batch= 256 | avg=   962.32 ms | min=   860.11 ms | out= 699,313,746 bytes
batch= 384 | avg=   941.29 ms | min=   894.33 ms | out= 699,313,746 bytes
batch= 512 | avg=   951.07 ms | min=   918.96 ms | out= 699,313,746 bytes
Best for LZ4Fast: batch=384, avg=941.29 ms, min=894.33 ms

=== LZ4 ===
Warmup: 1, Measured: 3
batch=   1 | avg= 23305.73 ms | min= 23240.59 ms | out= 644,689,762 bytes
batch=   2 | avg= 13358.26 ms | min= 13265.65 ms | out= 644,689,762 bytes
batch=   4 | avg=  8143.68 ms | min=  8137.16 ms | out= 644,689,762 bytes
batch=   8 | avg=  6423.89 ms | min=  6346.14 ms | out= 644,689,762 bytes
batch=  16 | avg=  5454.16 ms | min=  5396.60 ms | out= 644,689,762 bytes
batch=  32 | avg=  5064.66 ms | min=  5032.62 ms | out= 644,689,762 bytes
batch=  64 | avg=  5174.30 ms | min=  5108.39 ms | out= 644,689,762 bytes
batch=  96 | avg=  4748.79 ms | min=  4643.74 ms | out= 644,689,762 bytes
batch= 128 | avg=  5115.96 ms | min=  4897.92 ms | out= 644,689,762 bytes
batch= 192 | avg=  5628.60 ms | min=  5033.43 ms | out= 644,689,762 bytes
batch= 256 | avg=  5217.21 ms | min=  4852.48 ms | out= 644,689,762 bytes
batch= 384 | avg=  4272.61 ms | min=  4254.14 ms | out= 644,689,762 bytes
batch= 512 | avg=  4529.81 ms | min=  4433.02 ms | out= 644,689,762 bytes
Best for LZ4: batch=384, avg=4272.61 ms, min=4254.14 ms

I know this implementation still does not fully saturate all available performance potential, but in my observation it already keeps CPU utilization stably above 80%, which is sufficient for most cases.

ParaN3xus added 2 commits February 21, 2026 10:49

feat: parallel compress for lz4

ca12983

refactor: add check for batch size set

207a574

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Parallel compression for LZ4#158

Parallel compression for LZ4#158
ParaN3xus wants to merge 2 commits intonesrak1:mainfrom
ParaN3xus:feat-parallel-compression

ParaN3xus commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

ParaN3xus commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant