Batch small inputs into larger chunks #299

Closed
opened 2026-04-19 07:20:15 +00:00 by fuzzy · 1 comment
Owner
No description provided.
fuzzy self-assigned this 2026-04-19 07:20:15 +00:00
fuzzy closed this issue 2026-04-19 15:43:44 +00:00
Author
Owner

Implemented in commit 0712bd8:

  • Added batchEnabled, minChunkSize, maxBatchSize configuration options
  • Enhanced ChunkSplitter to read at least minChunkSize bytes (unless EOF) and cap at maxBatchSize
  • Updated ParallelCompressor and ParallelDecompressor to propagate batching settings
  • Fixed decompression buffer sizing by passing chunkSize to DecompressionJob

The batching reduces per‑chunk overhead by aggregating small reads before compression, significantly improving throughput for many small reads.

Implemented in commit 0712bd8: - Added `batchEnabled`, `minChunkSize`, `maxBatchSize` configuration options - Enhanced `ChunkSplitter` to read at least `minChunkSize` bytes (unless EOF) and cap at `maxBatchSize` - Updated `ParallelCompressor` and `ParallelDecompressor` to propagate batching settings - Fixed decompression buffer sizing by passing `chunkSize` to `DecompressionJob` The batching reduces per‑chunk overhead by aggregating small reads before compression, significantly improving throughput for many small reads.
Sign in to join this conversation.
No description provided.