- Go 100%
# Previous message: fix: make golangci-lint hook optional with local execution Change golangci-lint hook to be commented out by default, requiring manual installation and enabling. Update README with instructions. This avoids binary download issues in pre-commit environment. |
||
|---|---|---|
| .github/workflows | ||
| cmd/reptar | ||
| internal | ||
| man | ||
| vendor | ||
| .gitignore | ||
| .goreleaser.yml | ||
| .pre-commit-config.yaml | ||
| go.mod | ||
| go.sum | ||
| help.txt | ||
| LICENSE.md | ||
| README.md | ||
| ROADMAP.md | ||
Manta Archiver (reptar)
A multi‑format archive tool with GNU tar compatibility, built-in metadata, PAR2 recovery, and three‑point consensus validation.
Note: This project is under active development. Core archive formats (PAX, USTAR, GNU tar, ZIP, CPIO (newc and odc), AR) and compression (gzip, bzip2, xz, LZW) are stable and usable. Advanced features (metadata, repair, consensus validation) are fully implemented. All planned phases (1‑9) are complete; remaining work focuses on polish and extended format features.
Manta Archiver (command‑line tool reptar) is a modern archive utility written in Go that supports multiple archive formats (PAX, USTAR, GNU tar, ZIP, CPIO (newc and odc), AR) and compression methods (gzip, bzip2, xz, LZW) while preserving GNU tar command‑line compatibility. It is designed to be a drop‑in replacement for tar with additional features such as embedded metadata, integrity validation, PAR2‑based recovery, and three‑point consensus validation.
Table of Contents
- Features
- Installation
- Quick Start
- Supported Formats
- Compression
- Advanced Usage
- Consensus Validation
- Design Goals
- Project Status
- Building from Source
- Contributing
- License
- Acknowledgments
Features
✅ Implemented
- Multi‑format support – read/write PAX (POSIX.1‑2001), USTAR (POSIX.1‑1988), GNU tar, ZIP, CPIO newc (SVR4) and odc (old binary), and AR (Unix archive) formats
- GNU tar CLI compatibility – supports
-c(create),-x(extract),-t(list),-f,-v,-C,-z(gzip),-j(bzip2),-J(xz),-Z(compress),--format, etc. - Auto‑detection – detects archive format from magic bytes or file extension
- Transparent compression – gzip, bzip2, xz, LZW (compress) with auto-detection from magic bytes
- Compression configuration – configurable levels for gzip (
--gzip-level), bzip2 block size (--bzip2-block-size), etc. - Chain compression – ZIP format uses internal deflate compression with configurable levels
- Multi-stream support – reads concatenated gzip streams (common in log files)
- Safe extraction – path traversal protection,
--strip-components,--overwrite,--keep-old-files - Cross‑platform – pure Go, runs on Linux, macOS, Windows
- Extensible architecture – pluggable format registry for easy addition of new formats
- Built‑in file hashing – BLAKE3 and CRC32 hashes for each file, stored in metadata directory (
.manta_meta/metadata.csv) - Integrity validation – verify archive contents with
-Vflag (compares hashes against internal or external metadata) - Cross‑validation – verify consistency between internal metadata (
.manta_meta/metadata.csv) and external store - External metadata store – store file hashes, timestamps, archive metadata, and PAR2 recovery metadata in an external SQLite database or archive file (optional, with
--meta-dband--no-internal-metaflags). The archive format reuses the internal metadata directory structure and supports tar, zip, cpio, and ar formats. - External PAR2 parity storage – store parity blocks in external store with
--external‑only‑par2flag, providing resilience against archive header corruption; optional--internal‑only‑par2flag stores parity only inside archive - PAR2 recovery data generation – generate Reed‑Solomon parity data with
--repairflag, stored in.manta_meta/par2/; supports post‑creation repair generation with-ooutput archive - PAR2 recovery (repair) during validation – automatically detect corruption and attempt repair using parity data when
-V --repairis used - Repair during extraction – reconstruct corrupted files on‑the‑fly when extracting with
-x --repair - Extraction with external metadata verification – validate extracted files against external store (when
--meta-dbis provided) - Consensus validation pattern – three‑level validation ensuring agreement between archive data, internal metadata, and external metadata (see Consensus Validation for details)
- Streaming operation – read from stdin (
-f -) and write to stdout with metadata generation via tee readers (PAR2 generation in streaming mode is deferred)
🔄 In Progress / Planned (see ROADMAP.md)
- Advanced GNU tar features – sparse file write support (read‑only currently), incremental backups (
--listed-incremental), extended ACLs - Split archives – multi‑volume ZIP support (deferred to future phase)
- Enhanced streaming – PAR2 generation in pure streaming mode (currently deferred; streaming obviates PAR2)
⚠️ Limitations
- PAR2 recovery: Parity blocks (binary Reed‑Solomon data) are stored inside the archive (under
.manta_meta/par2/) by default, but can optionally be stored in an external store when--meta-dbis used (see--external-only-par2and--internal-only-par2flags). This provides resilience against archive header corruption, as parity data remains accessible even if the archive structure is damaged. PAR2 metadata (recovery percentage, block size, list of parity file names) is always stored externally in the store when--meta-dbis used.
Installation
From source (requires Go 1.25+)
go install git.lan.thwap.org/thwap/manta-archiver/cmd/reptar@latest
The binary reptar will be installed in $GOPATH/bin (or $GOBIN).
Pre‑built binaries
Not yet available – planned for future releases.
Quick Start
Create a PAX archive (default format)
reptar -c -f archive.tar dir1 file2.txt
List contents of any supported archive
reptar -t -f archive.tar
reptar -t -f archive.zip
reptar -t -f archive.cpio
Extract with automatic compression detection
reptar -x -f archive.tar.gz # gzip
reptar -x -f archive.tar.bz2 # bzip2
reptar -x -f archive.tar.xz # xz
reptar -x -f archive.tar.Z # LZW (compress)
Explicit format selection
reptar -c -f archive.zip --format=zip src/
reptar -c -f archive.cpio --format=cpio src/
reptar -c -f archive.ar --format=ar src/ # AR format
Compression examples
reptar -cjvf archive.tar.bz2 src/ # bzip2 compression
reptar -cJvf archive.tar.xz src/ # xz compression
reptar -cZvf archive.tar.Z src/ # LZW compression
reptar -czvf archive.tar.gz --gzip-level=9 src/ # maximum gzip compression
GNU tar compatibility
reptar -czvf backup.tar.gz /path/to/data
reptar -xzvf backup.tar.gz -C /tmp
Examples per format
# PAX (default)
reptar -c -f archive.tar src/
# USTAR
reptar -c -f archive.tar --format=ustar src/
# GNU tar
reptar -c -f archive.tar --format=gnu src/
# ZIP
reptar -c -f archive.zip --format=zip src/
# CPIO (newc)
reptar -c -f archive.cpio --format=cpio src/
# AR
reptar -c -f archive.ar --format=ar src/
Supported Formats
| Format | Read | Write | Notes |
|---|---|---|---|
| PAX (POSIX.1‑2001) | ✅ | ✅ | Default format, unlimited path lengths, nanosecond timestamps |
| USTAR (POSIX.1‑1988) | ✅ | ✅ | 8+3 filename limits, 8 GB file size limit |
| GNU tar | ✅ | ✅ | Long name/link extensions, sparse files (read‑only) |
| ZIP | ✅ | ✅ | Uses standard archive/zip, UTF‑8 filenames |
| CPIO newc (SVR4) | ✅ | ✅ | ASCII headers, directories, symlinks, regular files |
| CPIO odc (old binary) | ✅ | ✅ | Binary headers, legacy compatibility |
| AR (Unix archive) | ✅ | ✅ | BSD/System V variants, symbol tables, regular files, directories, symlinks |
Note: CPIO format defaults to newc (SVR4); use
--format=cpio-odcfor old binary format.
Detailed format comparison
| Format | Max file size | Path length | Compression | Metadata support | Typical use case |
|---|---|---|---|---|---|
| PAX | Unlimited (≥8 EiB) | Unlimited | External only (gzip, bzip2, etc.) | Full (internal & external) | Modern archives, long paths, high precision timestamps |
| USTAR | 8 GiB | 256 chars (prefix+name) | External only | Limited (no extended attributes) | Legacy compatibility, small archives |
| GNU tar | 8 GiB (sparse files larger) | Unlimited via extensions | External only | Limited (sparse, incremental) | GNU/Linux backups, sparse files |
| ZIP | 4 GiB (unlimited with ZIP64) | Unlimited (UTF‑8) | Internal (deflate, store, bzip2) | Limited (extra fields) | Cross‑platform distribution, web archives |
| CPIO newc | Unlimited | Unlimited | External only | Limited (basic attributes) | RPM packages, kernel initramfs |
| CPIO odc | Unlimited | Unlimited | External only | Limited (basic attributes) | Legacy Unix backups |
| AR | Unlimited | Unlimited (file name length limited) | None | Limited (symbol tables) | Static libraries, package files |
Compression
| Method | Read | Write | Auto‑detect | Flag |
|---|---|---|---|---|
| gzip | ✅ | ✅ | ✅ (.gz, .tgz) |
-z |
| bzip2 | ✅ | ✅ | ✅ (.bz2, .tbz2) |
-j |
| LZW (compress) | ✅ | ✅ | ✅ (.Z, .taz) |
-Z |
| xz | ✅ | ✅ | ✅ (.xz, .txz) |
-J |
Advanced Usage
Change working directory before archiving
reptar -c -f archive.tar -C /usr/local/bin .
Strip leading path components on extraction
reptar -x -f archive.tar --strip-components=2
Verbose output
reptar -cvf archive.tar src/
Overwrite existing files
reptar -x -f archive.tar --overwrite
Keep old files (do not overwrite)
reptar -x -f archive.tar --keep-old-files
PAR2 recovery data generation
Generate per‑file parity data (Reed‑Solomon) for archive files, stored inside the metadata directory. Use --repair to enable, and configure redundancy, block size, memory, and workers.
# Create archive with 10% redundancy (default)
reptar -c --repair -f archive.tar.gz src/
# Increase redundancy to 25% and use 4 workers
reptar -c --repair --repair-percent 25 --repair-workers 4 -f archive.tar.gz src/
# Use 2 MiB block size and limit memory to 512 MiB
reptar -c --repair --repair-block-size $((2*1024*1024)) --repair-memory $((512*1024*1024)) -f archive.tar.gz src/
Parity files are stored under .manta_meta/par2/ as <hash>_<index>.par with accompanying manifest files. When an external store is used (--meta-db), parity blocks can optionally be stored in the store instead of (or in addition to) the archive (see --external-only-par2 and --internal-only-par2 flags). Recovery (repair) is available during both validation (-V --repair) and extraction (-x --repair), and parity blocks missing from the archive will be fetched from the external store if available.
PAR2 recovery tutorial
When an archive becomes corrupted (e.g., due to storage errors), you can use the PAR2 recovery data to repair it. The following examples assume you have an archive backup.tar.gz that was created with --repair (or you have generated PAR2 files separately with reptar --repair -f backup.tar.gz).
Validate and repair an archive
# Validate integrity and attempt repair if corruption detected
reptar -V --repair -f backup.tar.gz
# If corruption is limited to specific files, the repair process will
# reconstruct them using the parity blocks. Success depends on the
# redundancy percentage and the amount of damage.
Extract with automatic repair
# Extract files, repairing any corrupted files on the fly
reptar -x --repair -f backup.tar.gz -C /restored/path
# The tool will first validate each file's hash, and if mismatches are found,
# attempt recovery using PAR2 data before writing the restored file to disk.
Generate PAR2 data for an existing archive
# Add PAR2 recovery data to an archive that was created without it
reptar --repair -f existing.tar.gz -o existing_with_par2.tar.gz
# This creates a new archive with the same content but includes the
# `.manta_meta/par2/` directory with parity files.
Check recovery coverage
# List PAR2 files and their redundancy percentage
reptar -t -f backup.tar.gz | grep -E '\.par|manifest'
# The manifest files (`.manta_meta/par2/*.manifest`) contain metadata
# about the recovery set, including the redundancy percentage and block size.
PAR2 storage location options
When using an external store (--meta-db), you can control where parity blocks are stored:
- Default (no flags) – parity blocks are stored both inside the archive (under
.manta_meta/par2/) and in the external database. This provides maximum resilience: parity is accessible even if the archive header is corrupted, while still being self‑contained within the archive. --external‑only‑par2– parity blocks are stored only in the external database, not inside the archive. Requires--meta‑db. Use this when you want to keep the archive smaller and rely entirely on external storage for recovery data.--internal‑only‑par2– parity blocks are stored only inside the archive, ignoring the external database (even if--meta‑dbis given). Use this when you want to keep parity data self‑contained and avoid storing binary BLOBs in the SQLite database.
These flags are mutually exclusive. Parity metadata (recovery percentage, block size, file names) is always stored in the external database when --meta‑db is used, regardless of the storage location of the actual parity blocks.
Examples:
# Store parity both inside archive and in external database (default)
reptar -c --repair --meta-db metadata.db -f archive.tar.gz src/
# Store parity only in external database
reptar -c --repair --meta-db metadata.db --external-only-par2 -f archive.tar.gz src/
# Store parity only inside archive (ignore external database for parity blocks)
reptar -c --repair --meta-db metadata.db --internal-only-par2 -f archive.tar.gz src/
# Repair operations automatically fetch missing parity blocks from external store
reptar -x --repair --meta-db metadata.db -f archive.tar.gz
External metadata store
You can store file metadata (BLAKE3/CRC32 hashes, timestamps, permissions) in an external SQLite3 database or archive file (tar, zip, cpio, ar) instead of (or in addition to) embedding them inside the archive. This is useful for auditing, integrity verification across multiple archives, or when you want to keep metadata separate. The archive format reuses the same directory structure as internal metadata (.manta_meta/), making conversion between SQLite and archive formats trivial. The store type is determined by the file extension: .db or .sqlite for SQLite, .tar, .tar.gz, .zip, .cpio, .ar (and compressed variants) for archive stores.
# Create archive and store metadata in external database
reptar -c -f archive.tar --meta-db metadata.db src/
# Skip internal metadata embedding (only external)
reptar -c -f archive.tar --meta-db metadata.db --no-internal-meta src/
# Verify archive using external metadata database
reptar -V -f archive.tar --meta-db metadata.db
# Create archive and store metadata in external archive file (e.g., tar)
reptar -c -f archive.tar --meta-db metadata.tar src/
# Verify using archive metadata store
reptar -V -f archive.tar --meta-db metadata.tar
The store is created automatically if it does not exist. For SQLite databases, three tables are created: archives, files, and par2_metadata. The archive path is stored as an absolute path for uniqueness. When --repair is used with --meta-db, PAR2 recovery metadata (recovery percentage, block size, list of parity file names) is also stored in the par2_metadata table (SQLite) or in the archive's metadata directory (archive store), while the actual parity blocks can be stored either inside the archive, in the external store, or both, depending on the --external‑only‑par2 and --internal‑only‑par2 flags (see PAR2 storage location options). The --no-internal-meta flag instructs the archive writer not to write the .manta_meta/metadata.csv file inside the archive, relying solely on the external store.
Example Workflows
Backup with external metadata DB and PAR2 recovery
This workflow creates a nightly backup of a directory, stores metadata in an external store, and adds PAR2 recovery data for resilience.
#!/bin/bash
BACKUP_DIR=/home/user/data
ARCHIVE=/backups/data-$(date +%Y%m%d).tar.gz
DB=/backups/metadata.db
reptar -c --repair --repair-percent 10 \
-f "$ARCHIVE" \
--meta-db "$DB" \
--no-internal-meta \
-v "$BACKUP_DIR"
Validate and auto‑repair archive weekly via cron
Add a cron job that validates the backup archive and attempts repair if corruption is detected.
# In crontab -e
0 2 * * 0 reptar -V --repair -f /backups/data-*.tar.gz --meta-db /backups/metadata.db 2>&1 | logger -t reptar-validate
Stream backups with compression and metadata
Pipe data from a streaming source (e.g., mysqldump) directly into reptar, compressing on the fly and storing metadata externally.
mysqldump -u root mydb | reptar -c -f - --format=pax \
--meta-db /backups/db_metadata.db \
-z --gzip-level=9 \
-o /backups/mydb-$(date +%Y%m%d).tar.gz
Note: Streaming operation currently requires reading from stdin (-f -) and writing to a file (-o).
Consensus Validation
Manta Archiver implements a three‑level consensus validation pattern that ensures data integrity through agreement between multiple independent sources:
Validation Levels
| Level | Metadata sources | Required agreement | Description |
|---|---|---|---|
| Level 0 | None | – | No validation performed (default when no metadata is present). |
| Level 1 | Internal metadata only | 2‑point agreement: archive data ↔ internal metadata | Validates each file against the embedded .manta_meta/metadata.csv. |
| Level 2 | External metadata only | 2‑point agreement: archive data ↔ external metadata | Validates each file against the external SQLite store (--meta-db). |
| Level 3 | Both internal and external metadata | 3‑point consensus: archive data ↔ internal metadata ↔ external metadata | All three sources must agree; mismatches are reported as validation failures. |
How it works
When you run reptar -V (validate) with both internal metadata (embedded in the archive) and an external store (--meta-db), the validation process checks:
- Archive data vs. external metadata – computes BLAKE3/CRC32 hashes of each file and compares them with the external store (required).
- Archive data vs. internal metadata – same comparison against the embedded
metadata.csv(if present). - Internal vs. external metadata – cross‑validation between the two metadata sources.
A file is considered valid only when all three comparisons agree. If any source disagrees, the validation fails for that file, and the mismatch is reported with a prefix indicating which source failed (external mismatch, internal mismatch, or cross‑validation error).
Automatic repair with PAR2
If validation fails and PAR2 recovery data is available (generated with --repair), the tool automatically attempts repair:
- During validation (
-V --repair) – reconstructs corrupted files using parity blocks, then re‑validates. - During extraction (
-x --repair) – repairs on the fly before writing files to disk. - Strict vs. lenient mode – Level 3 validation can be configured to accept two‑point agreement with a warning (lenient) or require three‑point agreement (strict). Currently the implementation is strict.
Use cases
- High‑integrity backups – create archives with both internal and external metadata, then validate with three‑point consensus to ensure no silent corruption.
- Forensic archiving – store metadata separately (external DB) for independent verification; compare with embedded metadata to detect tampering.
- Resilient storage – combine three‑point validation with PAR2 recovery to automatically detect and repair corruption.
Design Goals
- GNU tar compatibility – accept the same flags and produce archives that standard tools can read.
- Multi‑format core – a single tool for tar, zip, cpio, and future formats.
- Integrity and recovery – built‑in hashing (BLAKE3/CRC32) and PAR2 repair data.
- Extensible metadata – store file hashes, timestamps, and other metadata inside the archive or in an external store.
- Performance – competitive with native tools, leverage Go's concurrency for parallel hashing and compression.
Project Status
This project is actively developed. The core archive formats (PAX, USTAR, GNU, ZIP, CPIO, AR) and compression (gzip, bzip2, xz, LZW) are stable and usable. The roadmap is organized into phases:
- Phase 1 – Foundation & core archive structure ✅
- Phase 2 – Archive format support ✅ (PAX, USTAR, GNU, ZIP, CPIO, AR)
- Phase 3 – Compression format support ✅ (gzip, bzip2, xz, LZW)
- Phase 4 – Metadata directory & hashing ✅ (verification & PAR2 generation)
- Phase 5 – External SQLite3 metadata store ✅
- Phase 6 – Archive validation & PAR2 repair ✅ (validation with external SQLite DB, cross‑validation, exit‑code error types, post‑creation repair generation, extraction using external DB, and repair during validation implemented)
- Phase 7 – Advanced GNU tar features (completed – CLI flag parity, ZIP encryption, streaming; split archives deferred)
- Phase 8 – Testing & benchmarking ✅
- Phase 9 – Documentation & distribution ✅
See ROADMAP.md for detailed task breakdown and issue tracking.
Building from Source
git clone https://git.lan.thwap.org/thwap/manta-archiver.git
cd manta-archiver
go build ./cmd/reptar
./reptar --help
Run tests:
go test ./...
Contributing
Contributions are welcome! Please open an issue to discuss proposed changes or new features before submitting a pull request.
Development Setup
This project uses pre-commit to ensure code quality and consistency. After cloning the repository, install the pre-commit hooks:
# Install pre-commit (if not already installed)
pip install pre-commit
# Install the hooks
pre-commit install
# Optionally, run hooks on all files
pre-commit run --all-files
The hooks include:
- Go formatting (gofmt)
- Go linting (go vet, optional golangci-lint - see note below)
- Markdown linting
- YAML linting
- Spell checking
- General file hygiene (trailing whitespace, end-of-file newlines, etc.)
Note on golangci-lint: The configuration includes an optional golangci-lint hook (commented out by default). To enable it:
- Install golangci-lint:
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest - Uncomment the golangci-lint section in
.pre-commit-config.yaml - Run
pre-commit install --hook-type pre-committo update hooks
All hooks run automatically on git commit. You can also run them manually with pre-commit run --all-files.
Acknowledgments
- The Go standard library
archive/tarandarchive/zippackages - GNU tar for the comprehensive command‑line interface specification
- The PAR2 specification for error‑correcting archive recovery
Manta Archiver – because your data deserves a safety net.