No description
Find a file
Mike 'Fuzzy' Partin 32ab3d8476
Some checks failed
Test / test-linux (push) Has been cancelled
Test / test-macos (push) Has been cancelled
Test / test-windows (push) Has been cancelled
Test / lint (push) Has been cancelled
Test / security (push) Has been cancelled
chore: disable golangci-lint hook by default and document optional setup in README
# Previous message:
fix: make golangci-lint hook optional with local execution

Change golangci-lint hook to be commented out by default, requiring
manual installation and enabling. Update README with instructions.

This avoids binary download issues in pre-commit environment.
2026-04-13 15:02:55 -07:00
.github/workflows feat: add github actions workflows for testing and releases 2026-04-12 17:48:49 -07:00
cmd/reptar feat: support multiple metadata store types based on file extension 2026-04-13 14:51:08 -07:00
internal feat: add archive store integration tests for metadata persistence and retrieval 2026-04-13 14:51:21 -07:00
man feat: add github actions workflows for testing and releases 2026-04-12 17:48:49 -07:00
vendor feat: add pbkdf2 key derivation function implementation 2026-04-12 04:11:14 -07:00
.gitignore docs: update README with AR format and compression support 2026-04-10 01:51:04 -07:00
.goreleaser.yml feat: add github actions workflows for testing and releases 2026-04-12 17:48:49 -07:00
.pre-commit-config.yaml chore: disable golangci-lint hook by default and document optional setup in README 2026-04-13 15:02:55 -07:00
go.mod feat: add ZIP encryption support and streaming operation implementation 2026-04-12 03:28:44 -07:00
go.sum feat: add ZIP encryption support and streaming operation implementation 2026-04-12 03:28:44 -07:00
help.txt feat: add github actions workflows for testing and releases 2026-04-12 17:48:49 -07:00
LICENSE.md chore: add BSD 3-Clause license file 2026-04-13 07:59:52 -07:00
README.md chore: disable golangci-lint hook by default and document optional setup in README 2026-04-13 15:02:55 -07:00
ROADMAP.md fix: update roadmap to mark split archives feature as completed and deferred 2026-04-13 07:35:31 -07:00

Manta Archiver (reptar)

A multiformat archive tool with GNU tar compatibility, built-in metadata, PAR2 recovery, and threepoint consensus validation.

Note: This project is under active development. Core archive formats (PAX, USTAR, GNU tar, ZIP, CPIO (newc and odc), AR) and compression (gzip, bzip2, xz, LZW) are stable and usable. Advanced features (metadata, repair, consensus validation) are fully implemented. All planned phases (19) are complete; remaining work focuses on polish and extended format features.

Manta Archiver (commandline tool reptar) is a modern archive utility written in Go that supports multiple archive formats (PAX, USTAR, GNU tar, ZIP, CPIO (newc and odc), AR) and compression methods (gzip, bzip2, xz, LZW) while preserving GNU tar commandline compatibility. It is designed to be a dropin replacement for tar with additional features such as embedded metadata, integrity validation, PAR2based recovery, and threepoint consensus validation.

Table of Contents

Features

Implemented

  • Multiformat support read/write PAX (POSIX.12001), USTAR (POSIX.11988), GNU tar, ZIP, CPIO newc (SVR4) and odc (old binary), and AR (Unix archive) formats
  • GNU tar CLI compatibility supports -c (create), -x (extract), -t (list), -f, -v, -C, -z (gzip), -j (bzip2), -J (xz), -Z (compress), --format, etc.
  • Autodetection detects archive format from magic bytes or file extension
  • Transparent compression gzip, bzip2, xz, LZW (compress) with auto-detection from magic bytes
  • Compression configuration configurable levels for gzip (--gzip-level), bzip2 block size (--bzip2-block-size), etc.
  • Chain compression ZIP format uses internal deflate compression with configurable levels
  • Multi-stream support reads concatenated gzip streams (common in log files)
  • Safe extraction path traversal protection, --strip-components, --overwrite, --keep-old-files
  • Crossplatform pure Go, runs on Linux, macOS, Windows
  • Extensible architecture pluggable format registry for easy addition of new formats
  • Builtin file hashing BLAKE3 and CRC32 hashes for each file, stored in metadata directory (.manta_meta/metadata.csv)
  • Integrity validation verify archive contents with -V flag (compares hashes against internal or external metadata)
  • Crossvalidation verify consistency between internal metadata (.manta_meta/metadata.csv) and external store
  • External metadata store store file hashes, timestamps, archive metadata, and PAR2 recovery metadata in an external SQLite database or archive file (optional, with --meta-db and --no-internal-meta flags). The archive format reuses the internal metadata directory structure and supports tar, zip, cpio, and ar formats.
  • External PAR2 parity storage store parity blocks in external store with --externalonlypar2 flag, providing resilience against archive header corruption; optional --internalonlypar2 flag stores parity only inside archive
  • PAR2 recovery data generation generate ReedSolomon parity data with --repair flag, stored in .manta_meta/par2/; supports postcreation repair generation with -o output archive
  • PAR2 recovery (repair) during validation automatically detect corruption and attempt repair using parity data when -V --repair is used
  • Repair during extraction reconstruct corrupted files onthefly when extracting with -x --repair
  • Extraction with external metadata verification validate extracted files against external store (when --meta-db is provided)
  • Consensus validation pattern threelevel validation ensuring agreement between archive data, internal metadata, and external metadata (see Consensus Validation for details)
  • Streaming operation read from stdin (-f -) and write to stdout with metadata generation via tee readers (PAR2 generation in streaming mode is deferred)

🔄 In Progress / Planned (see ROADMAP.md)

  • Advanced GNU tar features sparse file write support (readonly currently), incremental backups (--listed-incremental), extended ACLs
  • Split archives multivolume ZIP support (deferred to future phase)
  • Enhanced streaming PAR2 generation in pure streaming mode (currently deferred; streaming obviates PAR2)

⚠️ Limitations

  • PAR2 recovery: Parity blocks (binary ReedSolomon data) are stored inside the archive (under .manta_meta/par2/) by default, but can optionally be stored in an external store when --meta-db is used (see --external-only-par2 and --internal-only-par2 flags). This provides resilience against archive header corruption, as parity data remains accessible even if the archive structure is damaged. PAR2 metadata (recovery percentage, block size, list of parity file names) is always stored externally in the store when --meta-db is used.

Installation

From source (requires Go 1.25+)

go install git.lan.thwap.org/thwap/manta-archiver/cmd/reptar@latest

The binary reptar will be installed in $GOPATH/bin (or $GOBIN).

Prebuilt binaries

Not yet available planned for future releases.

Quick Start

Create a PAX archive (default format)

reptar -c -f archive.tar dir1 file2.txt

List contents of any supported archive

reptar -t -f archive.tar
reptar -t -f archive.zip
reptar -t -f archive.cpio

Extract with automatic compression detection

reptar -x -f archive.tar.gz    # gzip
reptar -x -f archive.tar.bz2   # bzip2
reptar -x -f archive.tar.xz    # xz
reptar -x -f archive.tar.Z     # LZW (compress)

Explicit format selection

reptar -c -f archive.zip --format=zip src/
reptar -c -f archive.cpio --format=cpio src/
reptar -c -f archive.ar --format=ar src/   # AR format

Compression examples

reptar -cjvf archive.tar.bz2 src/          # bzip2 compression
reptar -cJvf archive.tar.xz src/           # xz compression
reptar -cZvf archive.tar.Z src/            # LZW compression
reptar -czvf archive.tar.gz --gzip-level=9 src/  # maximum gzip compression

GNU tar compatibility

reptar -czvf backup.tar.gz /path/to/data
reptar -xzvf backup.tar.gz -C /tmp

Examples per format

# PAX (default)
reptar -c -f archive.tar src/

# USTAR
reptar -c -f archive.tar --format=ustar src/

# GNU tar
reptar -c -f archive.tar --format=gnu src/

# ZIP
reptar -c -f archive.zip --format=zip src/

# CPIO (newc)
reptar -c -f archive.cpio --format=cpio src/

# AR
reptar -c -f archive.ar --format=ar src/

Supported Formats

Format Read Write Notes
PAX (POSIX.12001) Default format, unlimited path lengths, nanosecond timestamps
USTAR (POSIX.11988) 8+3 filename limits, 8 GB file size limit
GNU tar Long name/link extensions, sparse files (readonly)
ZIP Uses standard archive/zip, UTF8 filenames
CPIO newc (SVR4) ASCII headers, directories, symlinks, regular files
CPIO odc (old binary) Binary headers, legacy compatibility
AR (Unix archive) BSD/System V variants, symbol tables, regular files, directories, symlinks

Note: CPIO format defaults to newc (SVR4); use --format=cpio-odc for old binary format.

Detailed format comparison

Format Max file size Path length Compression Metadata support Typical use case
PAX Unlimited (≥8 EiB) Unlimited External only (gzip, bzip2, etc.) Full (internal & external) Modern archives, long paths, high precision timestamps
USTAR 8 GiB 256 chars (prefix+name) External only Limited (no extended attributes) Legacy compatibility, small archives
GNU tar 8 GiB (sparse files larger) Unlimited via extensions External only Limited (sparse, incremental) GNU/Linux backups, sparse files
ZIP 4 GiB (unlimited with ZIP64) Unlimited (UTF8) Internal (deflate, store, bzip2) Limited (extra fields) Crossplatform distribution, web archives
CPIO newc Unlimited Unlimited External only Limited (basic attributes) RPM packages, kernel initramfs
CPIO odc Unlimited Unlimited External only Limited (basic attributes) Legacy Unix backups
AR Unlimited Unlimited (file name length limited) None Limited (symbol tables) Static libraries, package files

Compression

Method Read Write Autodetect Flag
gzip (.gz, .tgz) -z
bzip2 (.bz2, .tbz2) -j
LZW (compress) (.Z, .taz) -Z
xz (.xz, .txz) -J

Advanced Usage

Change working directory before archiving

reptar -c -f archive.tar -C /usr/local/bin .

Strip leading path components on extraction

reptar -x -f archive.tar --strip-components=2

Verbose output

reptar -cvf archive.tar src/

Overwrite existing files

reptar -x -f archive.tar --overwrite

Keep old files (do not overwrite)

reptar -x -f archive.tar --keep-old-files

PAR2 recovery data generation

Generate perfile parity data (ReedSolomon) for archive files, stored inside the metadata directory. Use --repair to enable, and configure redundancy, block size, memory, and workers.

# Create archive with 10% redundancy (default)
reptar -c --repair -f archive.tar.gz src/

# Increase redundancy to 25% and use 4 workers
reptar -c --repair --repair-percent 25 --repair-workers 4 -f archive.tar.gz src/

# Use 2 MiB block size and limit memory to 512 MiB
reptar -c --repair --repair-block-size $((2*1024*1024)) --repair-memory $((512*1024*1024)) -f archive.tar.gz src/

Parity files are stored under .manta_meta/par2/ as <hash>_<index>.par with accompanying manifest files. When an external store is used (--meta-db), parity blocks can optionally be stored in the store instead of (or in addition to) the archive (see --external-only-par2 and --internal-only-par2 flags). Recovery (repair) is available during both validation (-V --repair) and extraction (-x --repair), and parity blocks missing from the archive will be fetched from the external store if available.

PAR2 recovery tutorial

When an archive becomes corrupted (e.g., due to storage errors), you can use the PAR2 recovery data to repair it. The following examples assume you have an archive backup.tar.gz that was created with --repair (or you have generated PAR2 files separately with reptar --repair -f backup.tar.gz).

Validate and repair an archive

# Validate integrity and attempt repair if corruption detected
reptar -V --repair -f backup.tar.gz

# If corruption is limited to specific files, the repair process will
# reconstruct them using the parity blocks. Success depends on the
# redundancy percentage and the amount of damage.

Extract with automatic repair

# Extract files, repairing any corrupted files on the fly
reptar -x --repair -f backup.tar.gz -C /restored/path

# The tool will first validate each file's hash, and if mismatches are found,
# attempt recovery using PAR2 data before writing the restored file to disk.

Generate PAR2 data for an existing archive

# Add PAR2 recovery data to an archive that was created without it
reptar --repair -f existing.tar.gz -o existing_with_par2.tar.gz

# This creates a new archive with the same content but includes the
# `.manta_meta/par2/` directory with parity files.

Check recovery coverage

# List PAR2 files and their redundancy percentage
reptar -t -f backup.tar.gz | grep -E '\.par|manifest'

# The manifest files (`.manta_meta/par2/*.manifest`) contain metadata
# about the recovery set, including the redundancy percentage and block size.

PAR2 storage location options

When using an external store (--meta-db), you can control where parity blocks are stored:

  • Default (no flags) parity blocks are stored both inside the archive (under .manta_meta/par2/) and in the external database. This provides maximum resilience: parity is accessible even if the archive header is corrupted, while still being selfcontained within the archive.
  • --externalonlypar2 parity blocks are stored only in the external database, not inside the archive. Requires --metadb. Use this when you want to keep the archive smaller and rely entirely on external storage for recovery data.
  • --internalonlypar2 parity blocks are stored only inside the archive, ignoring the external database (even if --metadb is given). Use this when you want to keep parity data selfcontained and avoid storing binary BLOBs in the SQLite database.

These flags are mutually exclusive. Parity metadata (recovery percentage, block size, file names) is always stored in the external database when --metadb is used, regardless of the storage location of the actual parity blocks.

Examples:

# Store parity both inside archive and in external database (default)
reptar -c --repair --meta-db metadata.db -f archive.tar.gz src/

# Store parity only in external database
reptar -c --repair --meta-db metadata.db --external-only-par2 -f archive.tar.gz src/

# Store parity only inside archive (ignore external database for parity blocks)
reptar -c --repair --meta-db metadata.db --internal-only-par2 -f archive.tar.gz src/

# Repair operations automatically fetch missing parity blocks from external store
reptar -x --repair --meta-db metadata.db -f archive.tar.gz

External metadata store

You can store file metadata (BLAKE3/CRC32 hashes, timestamps, permissions) in an external SQLite3 database or archive file (tar, zip, cpio, ar) instead of (or in addition to) embedding them inside the archive. This is useful for auditing, integrity verification across multiple archives, or when you want to keep metadata separate. The archive format reuses the same directory structure as internal metadata (.manta_meta/), making conversion between SQLite and archive formats trivial. The store type is determined by the file extension: .db or .sqlite for SQLite, .tar, .tar.gz, .zip, .cpio, .ar (and compressed variants) for archive stores.

# Create archive and store metadata in external database
reptar -c -f archive.tar --meta-db metadata.db src/

# Skip internal metadata embedding (only external)
reptar -c -f archive.tar --meta-db metadata.db --no-internal-meta src/

# Verify archive using external metadata database
reptar -V -f archive.tar --meta-db metadata.db
# Create archive and store metadata in external archive file (e.g., tar)
reptar -c -f archive.tar --meta-db metadata.tar src/

# Verify using archive metadata store
reptar -V -f archive.tar --meta-db metadata.tar

The store is created automatically if it does not exist. For SQLite databases, three tables are created: archives, files, and par2_metadata. The archive path is stored as an absolute path for uniqueness. When --repair is used with --meta-db, PAR2 recovery metadata (recovery percentage, block size, list of parity file names) is also stored in the par2_metadata table (SQLite) or in the archive's metadata directory (archive store), while the actual parity blocks can be stored either inside the archive, in the external store, or both, depending on the --externalonlypar2 and --internalonlypar2 flags (see PAR2 storage location options). The --no-internal-meta flag instructs the archive writer not to write the .manta_meta/metadata.csv file inside the archive, relying solely on the external store.

Example Workflows

Backup with external metadata DB and PAR2 recovery

This workflow creates a nightly backup of a directory, stores metadata in an external store, and adds PAR2 recovery data for resilience.

#!/bin/bash
BACKUP_DIR=/home/user/data
ARCHIVE=/backups/data-$(date +%Y%m%d).tar.gz
DB=/backups/metadata.db

reptar -c --repair --repair-percent 10 \
  -f "$ARCHIVE" \
  --meta-db "$DB" \
  --no-internal-meta \
  -v "$BACKUP_DIR"

Validate and autorepair archive weekly via cron

Add a cron job that validates the backup archive and attempts repair if corruption is detected.

# In crontab -e
0 2 * * 0 reptar -V --repair -f /backups/data-*.tar.gz --meta-db /backups/metadata.db 2>&1 | logger -t reptar-validate

Stream backups with compression and metadata

Pipe data from a streaming source (e.g., mysqldump) directly into reptar, compressing on the fly and storing metadata externally.

mysqldump -u root mydb | reptar -c -f - --format=pax \
  --meta-db /backups/db_metadata.db \
  -z --gzip-level=9 \
  -o /backups/mydb-$(date +%Y%m%d).tar.gz

Note: Streaming operation currently requires reading from stdin (-f -) and writing to a file (-o).

Consensus Validation

Manta Archiver implements a threelevel consensus validation pattern that ensures data integrity through agreement between multiple independent sources:

Validation Levels

Level Metadata sources Required agreement Description
Level0 None No validation performed (default when no metadata is present).
Level1 Internal metadata only 2point agreement: archive data ↔ internal metadata Validates each file against the embedded .manta_meta/metadata.csv.
Level2 External metadata only 2point agreement: archive data ↔ external metadata Validates each file against the external SQLite store (--meta-db).
Level3 Both internal and external metadata 3point consensus: archive data ↔ internal metadata ↔ external metadata All three sources must agree; mismatches are reported as validation failures.

How it works

When you run reptar -V (validate) with both internal metadata (embedded in the archive) and an external store (--meta-db), the validation process checks:

  1. Archive data vs. external metadata computes BLAKE3/CRC32 hashes of each file and compares them with the external store (required).
  2. Archive data vs. internal metadata same comparison against the embedded metadata.csv (if present).
  3. Internal vs. external metadata crossvalidation between the two metadata sources.

A file is considered valid only when all three comparisons agree. If any source disagrees, the validation fails for that file, and the mismatch is reported with a prefix indicating which source failed (external mismatch, internal mismatch, or crossvalidation error).

Automatic repair with PAR2

If validation fails and PAR2 recovery data is available (generated with --repair), the tool automatically attempts repair:

  • During validation (-V --repair) reconstructs corrupted files using parity blocks, then revalidates.
  • During extraction (-x --repair) repairs on the fly before writing files to disk.
  • Strict vs. lenient mode Level3 validation can be configured to accept twopoint agreement with a warning (lenient) or require threepoint agreement (strict). Currently the implementation is strict.

Use cases

  • Highintegrity backups create archives with both internal and external metadata, then validate with threepoint consensus to ensure no silent corruption.
  • Forensic archiving store metadata separately (external DB) for independent verification; compare with embedded metadata to detect tampering.
  • Resilient storage combine threepoint validation with PAR2 recovery to automatically detect and repair corruption.

Design Goals

  1. GNU tar compatibility accept the same flags and produce archives that standard tools can read.
  2. Multiformat core a single tool for tar, zip, cpio, and future formats.
  3. Integrity and recovery builtin hashing (BLAKE3/CRC32) and PAR2 repair data.
  4. Extensible metadata store file hashes, timestamps, and other metadata inside the archive or in an external store.
  5. Performance competitive with native tools, leverage Go's concurrency for parallel hashing and compression.

Project Status

This project is actively developed. The core archive formats (PAX, USTAR, GNU, ZIP, CPIO, AR) and compression (gzip, bzip2, xz, LZW) are stable and usable. The roadmap is organized into phases:

  • Phase 1 Foundation & core archive structure
  • Phase 2 Archive format support (PAX, USTAR, GNU, ZIP, CPIO, AR)
  • Phase 3 Compression format support (gzip, bzip2, xz, LZW)
  • Phase 4 Metadata directory & hashing (verification & PAR2 generation)
  • Phase 5 External SQLite3 metadata store
  • Phase 6 Archive validation & PAR2 repair (validation with external SQLite DB, crossvalidation, exitcode error types, postcreation repair generation, extraction using external DB, and repair during validation implemented)
  • Phase 7 Advanced GNU tar features (completed CLI flag parity, ZIP encryption, streaming; split archives deferred)
  • Phase 8 Testing & benchmarking
  • Phase 9 Documentation & distribution

See ROADMAP.md for detailed task breakdown and issue tracking.

Building from Source

git clone https://git.lan.thwap.org/thwap/manta-archiver.git
cd manta-archiver
go build ./cmd/reptar
./reptar --help

Run tests:

go test ./...

Contributing

Contributions are welcome! Please open an issue to discuss proposed changes or new features before submitting a pull request.

Development Setup

This project uses pre-commit to ensure code quality and consistency. After cloning the repository, install the pre-commit hooks:

# Install pre-commit (if not already installed)
pip install pre-commit

# Install the hooks
pre-commit install

# Optionally, run hooks on all files
pre-commit run --all-files

The hooks include:

  • Go formatting (gofmt)
  • Go linting (go vet, optional golangci-lint - see note below)
  • Markdown linting
  • YAML linting
  • Spell checking
  • General file hygiene (trailing whitespace, end-of-file newlines, etc.)

Note on golangci-lint: The configuration includes an optional golangci-lint hook (commented out by default). To enable it:

  1. Install golangci-lint: go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
  2. Uncomment the golangci-lint section in .pre-commit-config.yaml
  3. Run pre-commit install --hook-type pre-commit to update hooks

All hooks run automatically on git commit. You can also run them manually with pre-commit run --all-files.

Acknowledgments

  • The Go standard library archive/tar and archive/zip packages
  • GNU tar for the comprehensive commandline interface specification
  • The PAR2 specification for errorcorrecting archive recovery

Manta Archiver because your data deserves a safety net.