thwap/megamaid

Fork 0

Mega Tools is a suite of four complementary Go command-line utilities for recursively walking HTTP directory listings to discover files, filtering URLs using regex patterns, batch-downloading files with resumable concurrent transfers and smart caching, and displaying real-time progress bars when piped together into complete download pipelines.

Go 99.4%
Makefile 0.6%

Find a file

Mike 'Fuzzy' Partin a3cd9cae52 docs: update README with new mega-progress tool and enhanced usage # Previous message: docs: update README with new tools and filter example Co-authored-by: aider (deepseek/deepseek-reasoner) <aider@aider.chat>		2026-02-26 15:23:17 -08:00
cmd	feat: add colored error output with ANSI escape codes	2026-02-17 07:44:04 -08:00
konsoru	refactor: simplify filter regex and clean up konsoru bar code	2026-02-14 18:59:42 -08:00
signal	feat: add graceful shutdown signal handling to all commands	2026-02-14 10:09:45 -08:00
vendor	chore: vendor gopkg.in/yaml.v3 dependency	2026-02-08 06:16:42 -08:00
.gitignore	chore: ignore aider cache files in git	2026-01-22 11:56:18 -08:00
filter-example.yaml	refactor: simplify filter regex and clean up konsoru bar code	2026-02-14 18:59:42 -08:00
go.mod	refactor: simplify filter regex and clean up konsoru bar code	2026-02-14 18:59:42 -08:00
go.sum	chore: vendor gopkg.in/yaml.v3 dependency	2026-02-08 06:16:42 -08:00
Makefile	build: add mega-progress binary to Makefile	2026-02-13 16:27:21 -08:00
README.md	docs: update README with new mega-progress tool and enhanced usage	2026-02-26 15:23:17 -08:00

README.md

Mega Tools: Web Directory Walker, Batch Downloader, and URL Filter

A suite of command-line tools for recursively walking web directories, filtering URLs, efficiently downloading files in bulk, and displaying progress.

Overview

This project provides four complementary tools:

mega-walk: Recursively traverses HTTP directories to discover files and subdirectories
mega-fetch: Downloads files from a list of URLs with support for resumable downloads, concurrent workers, and structured output
mega-filter: Filters URLs based on configurable accept/reject patterns using regular expressions
mega-progress: Displays real-time progress bar for download operations when piped from mega-fetch

Installation

Prerequisites

Go 1.16 or higher

Building from source

# Clone the repository
git clone <repository-url>
cd <repository-directory>

# Build all tools
make

Alternatively, build manually:

go build -o mega-walk ./cmd/mega-walk
go build -o mega-fetch ./cmd/mega-fetch
go build -o mega-filter ./cmd/mega-filter
go build -o mega-progress ./cmd/mega-progress

Usage

mega-walk: Web Directory Walker

Recursively traverses web directories to discover files and URLs.

./mega-walk [-no-head] [-c CONCURRENCY] <base_url>

Options:

-no-head: Disable HEAD requests and omit content-length from output
-c CONCURRENCY: Maximum number of concurrent requests (default: 10)

Example:

# Basic usage with content-length output
./mega-walk http://ftp.openbsd.org/pub/OpenBSD/7.8/

# Without HEAD requests (faster)
./mega-walk -no-head http://example.com/dir/

# With higher concurrency
./mega-walk -c 20 http://example.com/dir/

Output Format:

With -no-head: One URL per line
Without -no-head: URL,content-length format

Features:

Recursively follows links within the same domain
Skips query parameters and fragments
Respects directory boundaries
Handles common web server directory listings
Concurrent request processing

mega-filter: URL Filter Tool

Filters URLs from a list based on configurable accept and reject patterns using regular expressions.

./mega-filter <filter.yaml|filter.json> <input.txt> [output.txt]

Arguments:

filter.yaml|filter.json: Configuration file with accept/reject patterns (YAML or JSON format)
input.txt: Input file containing URLs (one per line), use - for stdin
output.txt: Optional output file (if omitted, writes to stdout)

Filter File Format:

The filter configuration file can be in YAML or JSON format and contains two optional arrays of regular expression patterns:

YAML Example (filter-example.yaml):

accept:
  - "^.*\\.img\\.xz$"
reject: []

JSON Equivalent:

{
  "accept": [
    "^.*\\.img\\.xz$"
  ],
  "reject": []
}

Filtering Logic:

If no accept patterns are specified, all URLs are initially accepted
If accept patterns exist, a URL must match at least one pattern to be considered
If a URL matches any reject pattern, it is excluded regardless of accept matches
URLs are processed line by line from the input file

Example Usage:

# Filter URLs to only include .img.xz files
./mega-filter filter-example.yaml urls.txt filtered-urls.txt

# Filter from stdin and output to stdout
cat urls.txt | ./mega-filter filter-example.yaml - | head -20

mega-fetch: Batch File Downloader

Downloads files from a list of URLs with advanced features.

./mega-fetch [OPTIONS] <url_file> <output_dir>

Options:

-w WORKERS: Number of concurrent workers (default: 1)
-q: Quiet mode (suppress error output)

Use - as the url_file argument to read from stdin.

Example:

# Basic download with 1 worker
./mega-fetch urls.txt ./downloads/

# Download with 5 concurrent workers
./mega-fetch -w 5 urls.txt ./downloads/

# Download from stdin with quiet mode
cat urls.txt | ./mega-fetch -w 3 -q - ./downloads/

Output Format: Each processed file outputs one line in the format: local_path,bytes_downloaded,status Where status is one of: new, resumed, skipped, or error

Special Total Line: Before processing begins, mega-fetch outputs a special line: total,<count> where <count> is the total number of files to process. This is used by mega-progress to display accurate progress.

Features:

Resumable downloads: Automatically resumes interrupted downloads
Smart caching: Skips files that already exist with matching size
Concurrent downloads: Multiple workers for faster downloads
Structured output: Produces machine-readable output for further processing
Graceful shutdown: Handles interrupt signals properly

mega-progress: Progress Display Tool

Displays a real-time progress bar for download operations when piped from mega-fetch.

./mega-progress

This tool reads from stdin and expects the output format from mega-fetch. It automatically displays:

Progress bar with percentage completion
File count (New, Resumed, Skipped, Total)
Total bytes transferred
Download speed
Color-coded output

Example Usage:

# Full pipeline with progress display
./mega-walk http://example.com/dir/ | \
  ./mega-fetch -w 5 - ./downloads/ | \
  ./mega-progress

# Or with filtering
./mega-walk http://example.com/dir/ | \
  ./mega-filter filter-example.yaml - | \
  ./mega-fetch -w 5 - ./downloads/ | \
  ./mega-progress

Features:

Real-time progress updates
Color-coded status display
Terminal-aware output (cleans up display lines)
Speed calculation and human-readable sizes
Graceful shutdown handling
Spinner display before total count is known

Advanced Usage

Pipeline Examples

Combine all tools for a complete workflow:

Basic download workflow:

# Walk a directory and download all files with progress
./mega-walk http://ftp.openbsd.org/pub/OpenBSD/7.8/ | \
  ./mega-fetch -w 10 - ./downloads/ | \
  ./mega-progress

Advanced workflow with filtering:

# Walk a directory, filter for specific files, then download with progress
./mega-walk http://example.com/media/ > all-urls.txt

# Filter to only include .img.xz files using the example filter
./mega-filter filter-example.yaml all-urls.txt media-urls.txt

# Download filtered media files with progress tracking
cat media-urls.txt | ./mega-fetch -w 8 - ./downloads/ | ./mega-progress

Complete one-liner:

./mega-walk http://example.com/data/ | \
  ./mega-filter filter-example.yaml - | \
  ./mega-fetch -w 5 - ./downloads/ | \
  ./mega-progress

Saving results to file while showing progress:

# Use tee to save results and show progress
./mega-walk http://example.com/ | \
  ./mega-fetch -w 5 - ./downloads/ | \
  tee download-results.txt | \
  ./mega-progress

Makefile Targets

The included Makefile provides convenient shortcuts:

# Build all tools (including mega-progress)
make

# Clean build artifacts
make clean

# Run the test workflow
make test

The make test command will:

Build all tools
Test mega-walk with OpenBSD's FTP server
Test mega-fetch with the discovered URLs
Test mega-filter with a sample URL list and the provided filter-example.yaml

How It Works

mega-walk Internals

Starts at the base URL and fetches the HTML content
Parses HTML to extract all <a href="..."> links
Filters links to stay within the same domain and base path
Recursively processes directories using concurrent goroutines
Outputs discovered URLs (with optional content-length) to stdout

mega-fetch Internals

Reads URLs from a file or stdin
Outputs total,<count> line with total file count
For each URL:
- Checks if file already exists locally
- Performs HEAD request to check file size
- Resumes download if partial file exists and server supports it
- Downloads fresh copy if needed
- Outputs local_path,bytes,status for each processed file
Uses worker goroutines for concurrent downloads

mega-progress Internals

Reads structured output from mega-fetch stdin
Parses the total,<count> line to know total file count
Updates counters for new, resumed, and skipped files
Calculates transfer speed and progress percentage
Displays real-time progress bar with ANSI escape codes
Shows spinner animation before total count is known
Cleans up terminal output on completion

File Structure

.
├── cmd/
│   ├── mega-walk/
│   │   └── main.go          # Directory walking tool
│   ├── mega-fetch/
│   │   └── main.go          # Batch downloading tool
│   ├── mega-filter/
│   │   └── main.go          # URL filtering tool
│   └── mega-progress/
│       └── main.go          # Progress display tool
├── konsoru/
│   ├── bar/                 # Progress bar library
│   ├── color/               # Color utilities
│   ├── cursor/              # Cursor control
│   ├── style/               # Terminal styles
│   └── util/                # Utility functions
├── signal/
│   └── signal.go            # Graceful shutdown handling
├── vendor/                  # External dependencies
├── Makefile                 # Build and test automation
├── filter-example.yaml      # Example filter configuration
└── README.md               # This file

Dependencies

golang.org/x/net/html: HTML parsing for mega-walk
golang.org/x/term: Terminal handling for progress display
gopkg.in/yaml.v3: YAML parsing for mega-filter

All dependencies are vendored in the vendor/ directory.

Limitations

mega-walk: Only works with web servers that provide directory listings in HTML format
mega-fetch: Resume functionality depends on server support for Range headers
mega-filter: Requires basic understanding of regular expressions for pattern creation
All tools are designed for public HTTP/HTTPS servers, not authenticated sites

Troubleshooting

Common Issues

"Error: HEAD request failed"
- Some servers block HEAD requests
- Use -no-head flag with mega-walk to skip HEAD requests
- mega-fetch will fall back to GET requests automatically
Progress bar not displaying
- Ensure terminal supports ANSI escape codes
- Pipe mega-fetch output through mega-progress
Downloads stopping prematurely
- Check network connectivity
- Reduce number of workers with -w flag
- Some servers may rate-limit concurrent connections
No output from mega-progress
- Ensure mega-fetch is producing structured output
- Check that mega-fetch is not using quiet mode (-q) if you want to see errors

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

License

[Specify your license here]

Acknowledgments

Built with Go's excellent standard library
Inspired by traditional Unix tools like wget and curl
Uses the konsoru library for terminal UI components