An optimized, parallel, and thread-safe port of libbz2 to pure Go.
Find a file
Mike 'Fuzzy' Partin fc9206b41b
Some checks are pending
Go Tests / test (macos-latest) (push) Waiting to run
Go Tests / test (ubuntu-latest) (push) Waiting to run
Go Tests / test (windows-latest) (push) Waiting to run
docs: update README with expanded benchmarks and API docs link
feat: add b.SetBytes to all benchmark tests for accurate byte counting
refactor: improve README wording for parallel compression and CGo bindings
perf: add larger dataset benchmarks for better throughput analysis
style: update benchmark tables with more data points and improved formatting
chore: update API documentation link format in README

# Previous message:
feat: add throughput reporting to benchmarks and update performance table

- Add b.SetBytes() to all benchmark functions for MB/s reporting
- Extend performance table in README.md with 100MB, 256MB, 512MB results
- Include parallel compression throughput in table
- Update speedup factor for parallel decompression (2.64× for 10MB)
2026-04-20 12:33:57 -07:00
.github/workflows feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00
test refactor: simplify stream compression logic and remove parallel compressor usage 2026-04-19 06:10:08 -07:00
.gitignore feat: add static linking support with build tags 2026-04-19 00:40:12 -07:00
.pre-commit-config.yaml feat: implement huffman coding and crc32 checksum for bzip2 compatibility 2026-04-15 21:22:39 -07:00
benchmark_batching_test.go feat: implement input batching and comprehensive benchmarking suite 2026-04-19 08:43:22 -07:00
benchmark_comparison_test.go feat: add large data benchmarks and skip large data in system bzip2 tests 2026-04-20 11:50:36 -07:00
benchmark_test.go docs: update README with expanded benchmarks and API docs link 2026-04-20 12:33:57 -07:00
buffer.go feat: add MaxDecompressedSize function for bzip2 decompression sizing 2026-04-20 11:49:50 -07:00
buffer_test.go Phase 7: Testing, documentation, and release prep 2026-04-19 11:08:48 -07:00
bz2.go docs: add comments to bz2 action constants 2026-04-19 11:45:55 -07:00
bz2_test.go Phase 7: Testing, documentation, and release prep 2026-04-19 11:08:48 -07:00
compressor.go refactor: use MaxDecompressedSize for size estimation 2026-04-20 11:50:07 -07:00
COPYING chore: rename license file from LICENSE to COPYING 2026-04-19 13:25:22 -07:00
debug.go feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00
debug_noop.go feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00
example_test.go feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00
format.go feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00
fuzz_test.go Phase 7: Testing, documentation, and release prep 2026-04-19 11:08:48 -07:00
go.mod feat: add go module definition for gobz2 project 2026-04-15 12:31:49 -07:00
integration_test.go Phase 7: Fix integration tests and edge cases 2026-04-19 11:40:07 -07:00
interfaces.go feat: implement input batching and comprehensive benchmarking suite 2026-04-19 08:43:22 -07:00
parallel.go feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00
parallel_decompress.go feat: implement dynamic buffer sizing and size limit for decompression jobs 2026-04-20 11:50:20 -07:00
parallel_test.go Phase 7: Fix integration tests and edge cases 2026-04-19 11:40:07 -07:00
pool.go Phase 7: Fix integration tests and edge cases 2026-04-19 11:40:07 -07:00
README.md docs: update README with expanded benchmarks and API docs link 2026-04-20 12:33:57 -07:00
ROADMAP.md feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00
stream.go feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00
stream_compress.go feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00
stream_test.go Phase 7: Testing, documentation, and release prep 2026-04-19 11:08:48 -07:00
streaming.go Phase 7: Fix integration tests and edge cases 2026-04-19 11:40:07 -07:00
USAGE.md feat: add cross-platform CI testing and error handling improvements 2026-04-19 12:58:29 -07:00

gobz2

Parallel bzip2 compression/decompression library for Go, leveraging libbz2 via CGo for maximum performance.

Features

  • Parallel compression: Use multiple CPU cores for faster compression
  • CGo bindings: Direct libbz2 integration for optimal speed/compatibility
  • Thread-safe: Safe for concurrent use across goroutines
  • Streaming API: Support for io.Reader/io.Writer interfaces
  • Memory efficient: Configurable buffer pools and allocation strategies

Requirements

  • Go 1.25 or later
  • libbz2 development headers (libbz2-dev on Debian/Ubuntu, bzip2-devel on RHEL/Fedora)

Installation

go get git.lan.thwap.org/thwap/gobz2

Static Linking

To link statically against libbz2, use the static build tag:

go build -tags static ./...

This will produce a fully statically linked binary (including libc). Note: Static linking requires the static library libbz2.a (often provided by libbz2-dev or bzip2-devel packages).

Versioning

This project follows Semantic Versioning. Given a version number MAJOR.MINOR.PATCH:

  • MAJOR version for incompatible API changes
  • MINOR version for new functionality added in a backward compatible manner
  • PATCH version for backward compatible bug fixes

The public API consists of the exported functions, types, and constants documented in the godoc.

Quick Start

package main

import (
    "bytes"
    "fmt"
    "git.lan.thwap.org/thwap/gobz2"
)

func main() {
    data := []byte("Hello, world! Parallel bzip2 compression test.")

    // Compress data
    compressed, err := gobz2.CompressBytes(data)
    if err != nil {
        panic(err)
    }

    fmt.Printf("Compressed %d bytes to %d bytes (%.1f%%)\n",
    len(data), len(compressed), 100*float64(len(compressed))/float64(len(data)))

    // Decompress data
    decompressed, err := gobz2.DecompressBytes(compressed)
    if err != nil {
        panic(err)
    }

    if bytes.Equal(data, decompressed) {
        fmt.Println("Data integrity verified!")
    }
}

Performance Benchmarks

Benchmarks run on AMD Ryzen 5 5600 (6 cores) with Go 1.25, Linux.

Throughput (MB/s)

Operation 1 KB 100 KB 1 MB 10 MB 100 MB 256 MB 512 MB
Compress (single) 3.6 13.1 14.9 70.9 77.2 79.7 78.0
Decompress (single) 11.8 20.3 24.1 53.9 110.5 158.4 170.1
Parallel compress 14.9 66.5 70.9 68.1 68.8
Parallel decompress 28.6 142.1 165.3 161.3 163.8

Times are median of 3 runs. Parallel decompression shows significant speedup (2.64× for 10MB data) by utilizing multiple CPU cores. Parallel compression throughput is similar to singlethreaded for these sizes due to chunking overhead; benefits increase with larger datasets.

Comparison with system bzip2 (1 MB data)

Implementation Compression Time Compression Throughput
gobz2 (single) 68.6 ms 15.28 MB/s
system bzip2 77.1 ms 13.61 MB/s

gobz2 provides ~12% faster compression than system bzip2 single-threaded, with significantly faster decompression (2× faster than compress/bzip2).

Documentation

API docs: godoc.

License

BSD 3-Clause - see LICENSE for details.