- Go 99.4%
- Makefile 0.6%
# Previous message: docs: update README with new tools and filter example Co-authored-by: aider (deepseek/deepseek-reasoner) <aider@aider.chat> |
||
|---|---|---|
| cmd | ||
| konsoru | ||
| signal | ||
| vendor | ||
| .gitignore | ||
| filter-example.yaml | ||
| go.mod | ||
| go.sum | ||
| Makefile | ||
| README.md | ||
Mega Tools: Web Directory Walker, Batch Downloader, and URL Filter
A suite of command-line tools for recursively walking web directories, filtering URLs, efficiently downloading files in bulk, and displaying progress.
Overview
This project provides four complementary tools:
- mega-walk: Recursively traverses HTTP directories to discover files and subdirectories
- mega-fetch: Downloads files from a list of URLs with support for resumable downloads, concurrent workers, and structured output
- mega-filter: Filters URLs based on configurable accept/reject patterns using regular expressions
- mega-progress: Displays real-time progress bar for download operations when piped from mega-fetch
Installation
Prerequisites
- Go 1.16 or higher
Building from source
# Clone the repository
git clone <repository-url>
cd <repository-directory>
# Build all tools
make
Alternatively, build manually:
go build -o mega-walk ./cmd/mega-walk
go build -o mega-fetch ./cmd/mega-fetch
go build -o mega-filter ./cmd/mega-filter
go build -o mega-progress ./cmd/mega-progress
Usage
mega-walk: Web Directory Walker
Recursively traverses web directories to discover files and URLs.
./mega-walk [-no-head] [-c CONCURRENCY] <base_url>
Options:
-no-head: Disable HEAD requests and omit content-length from output-c CONCURRENCY: Maximum number of concurrent requests (default: 10)
Example:
# Basic usage with content-length output
./mega-walk http://ftp.openbsd.org/pub/OpenBSD/7.8/
# Without HEAD requests (faster)
./mega-walk -no-head http://example.com/dir/
# With higher concurrency
./mega-walk -c 20 http://example.com/dir/
Output Format:
- With
-no-head: One URL per line - Without
-no-head:URL,content-lengthformat
Features:
- Recursively follows links within the same domain
- Skips query parameters and fragments
- Respects directory boundaries
- Handles common web server directory listings
- Concurrent request processing
mega-filter: URL Filter Tool
Filters URLs from a list based on configurable accept and reject patterns using regular expressions.
./mega-filter <filter.yaml|filter.json> <input.txt> [output.txt]
Arguments:
filter.yaml|filter.json: Configuration file with accept/reject patterns (YAML or JSON format)input.txt: Input file containing URLs (one per line), use-for stdinoutput.txt: Optional output file (if omitted, writes to stdout)
Filter File Format:
The filter configuration file can be in YAML or JSON format and contains two optional arrays of regular expression patterns:
YAML Example (filter-example.yaml):
accept:
- "^.*\\.img\\.xz$"
reject: []
JSON Equivalent:
{
"accept": [
"^.*\\.img\\.xz$"
],
"reject": []
}
Filtering Logic:
- If no
acceptpatterns are specified, all URLs are initially accepted - If
acceptpatterns exist, a URL must match at least one pattern to be considered - If a URL matches any
rejectpattern, it is excluded regardless of accept matches - URLs are processed line by line from the input file
Example Usage:
# Filter URLs to only include .img.xz files
./mega-filter filter-example.yaml urls.txt filtered-urls.txt
# Filter from stdin and output to stdout
cat urls.txt | ./mega-filter filter-example.yaml - | head -20
mega-fetch: Batch File Downloader
Downloads files from a list of URLs with advanced features.
./mega-fetch [OPTIONS] <url_file> <output_dir>
Options:
-w WORKERS: Number of concurrent workers (default: 1)-q: Quiet mode (suppress error output)
Use - as the url_file argument to read from stdin.
Example:
# Basic download with 1 worker
./mega-fetch urls.txt ./downloads/
# Download with 5 concurrent workers
./mega-fetch -w 5 urls.txt ./downloads/
# Download from stdin with quiet mode
cat urls.txt | ./mega-fetch -w 3 -q - ./downloads/
Output Format:
Each processed file outputs one line in the format: local_path,bytes_downloaded,status
Where status is one of: new, resumed, skipped, or error
Special Total Line:
Before processing begins, mega-fetch outputs a special line: total,<count> where <count> is the total number of files to process. This is used by mega-progress to display accurate progress.
Features:
- Resumable downloads: Automatically resumes interrupted downloads
- Smart caching: Skips files that already exist with matching size
- Concurrent downloads: Multiple workers for faster downloads
- Structured output: Produces machine-readable output for further processing
- Graceful shutdown: Handles interrupt signals properly
mega-progress: Progress Display Tool
Displays a real-time progress bar for download operations when piped from mega-fetch.
./mega-progress
This tool reads from stdin and expects the output format from mega-fetch. It automatically displays:
- Progress bar with percentage completion
- File count (New, Resumed, Skipped, Total)
- Total bytes transferred
- Download speed
- Color-coded output
Example Usage:
# Full pipeline with progress display
./mega-walk http://example.com/dir/ | \
./mega-fetch -w 5 - ./downloads/ | \
./mega-progress
# Or with filtering
./mega-walk http://example.com/dir/ | \
./mega-filter filter-example.yaml - | \
./mega-fetch -w 5 - ./downloads/ | \
./mega-progress
Features:
- Real-time progress updates
- Color-coded status display
- Terminal-aware output (cleans up display lines)
- Speed calculation and human-readable sizes
- Graceful shutdown handling
- Spinner display before total count is known
Advanced Usage
Pipeline Examples
Combine all tools for a complete workflow:
Basic download workflow:
# Walk a directory and download all files with progress
./mega-walk http://ftp.openbsd.org/pub/OpenBSD/7.8/ | \
./mega-fetch -w 10 - ./downloads/ | \
./mega-progress
Advanced workflow with filtering:
# Walk a directory, filter for specific files, then download with progress
./mega-walk http://example.com/media/ > all-urls.txt
# Filter to only include .img.xz files using the example filter
./mega-filter filter-example.yaml all-urls.txt media-urls.txt
# Download filtered media files with progress tracking
cat media-urls.txt | ./mega-fetch -w 8 - ./downloads/ | ./mega-progress
Complete one-liner:
./mega-walk http://example.com/data/ | \
./mega-filter filter-example.yaml - | \
./mega-fetch -w 5 - ./downloads/ | \
./mega-progress
Saving results to file while showing progress:
# Use tee to save results and show progress
./mega-walk http://example.com/ | \
./mega-fetch -w 5 - ./downloads/ | \
tee download-results.txt | \
./mega-progress
Makefile Targets
The included Makefile provides convenient shortcuts:
# Build all tools (including mega-progress)
make
# Clean build artifacts
make clean
# Run the test workflow
make test
The make test command will:
- Build all tools
- Test mega-walk with OpenBSD's FTP server
- Test mega-fetch with the discovered URLs
- Test mega-filter with a sample URL list and the provided filter-example.yaml
How It Works
mega-walk Internals
- Starts at the base URL and fetches the HTML content
- Parses HTML to extract all
<a href="...">links - Filters links to stay within the same domain and base path
- Recursively processes directories using concurrent goroutines
- Outputs discovered URLs (with optional content-length) to stdout
mega-fetch Internals
- Reads URLs from a file or stdin
- Outputs
total,<count>line with total file count - For each URL:
- Checks if file already exists locally
- Performs HEAD request to check file size
- Resumes download if partial file exists and server supports it
- Downloads fresh copy if needed
- Outputs
local_path,bytes,statusfor each processed file
- Uses worker goroutines for concurrent downloads
mega-progress Internals
- Reads structured output from mega-fetch stdin
- Parses the
total,<count>line to know total file count - Updates counters for new, resumed, and skipped files
- Calculates transfer speed and progress percentage
- Displays real-time progress bar with ANSI escape codes
- Shows spinner animation before total count is known
- Cleans up terminal output on completion
File Structure
.
├── cmd/
│ ├── mega-walk/
│ │ └── main.go # Directory walking tool
│ ├── mega-fetch/
│ │ └── main.go # Batch downloading tool
│ ├── mega-filter/
│ │ └── main.go # URL filtering tool
│ └── mega-progress/
│ └── main.go # Progress display tool
├── konsoru/
│ ├── bar/ # Progress bar library
│ ├── color/ # Color utilities
│ ├── cursor/ # Cursor control
│ ├── style/ # Terminal styles
│ └── util/ # Utility functions
├── signal/
│ └── signal.go # Graceful shutdown handling
├── vendor/ # External dependencies
├── Makefile # Build and test automation
├── filter-example.yaml # Example filter configuration
└── README.md # This file
Dependencies
golang.org/x/net/html: HTML parsing for mega-walkgolang.org/x/term: Terminal handling for progress displaygopkg.in/yaml.v3: YAML parsing for mega-filter
All dependencies are vendored in the vendor/ directory.
Limitations
- mega-walk: Only works with web servers that provide directory listings in HTML format
- mega-fetch: Resume functionality depends on server support for Range headers
- mega-filter: Requires basic understanding of regular expressions for pattern creation
- All tools are designed for public HTTP/HTTPS servers, not authenticated sites
Troubleshooting
Common Issues
-
"Error: HEAD request failed"
- Some servers block HEAD requests
- Use
-no-headflag with mega-walk to skip HEAD requests - mega-fetch will fall back to GET requests automatically
-
Progress bar not displaying
- Ensure terminal supports ANSI escape codes
- Pipe mega-fetch output through mega-progress
-
Downloads stopping prematurely
- Check network connectivity
- Reduce number of workers with
-wflag - Some servers may rate-limit concurrent connections
-
No output from mega-progress
- Ensure mega-fetch is producing structured output
- Check that mega-fetch is not using quiet mode (
-q) if you want to see errors
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
License
[Specify your license here]
Acknowledgments
- Built with Go's excellent standard library
- Inspired by traditional Unix tools like wget and curl
- Uses the konsoru library for terminal UI components