No description
- Go 98.5%
- Makefile 1.5%
| .gitignore | ||
| .pre-commit-config.yaml | ||
| batch.go | ||
| cache.go | ||
| config.go | ||
| disk_cache.go | ||
| eviction.go | ||
| example_test.go | ||
| go.mod | ||
| Makefile | ||
| operations.go | ||
| README.md | ||
| recovery.go | ||
Table of Contents
- Overview
- Core Design Principles
- Data Structures
- Configuration
- File Structure on Disk
- Thread Safety Guarantees
- Core Operations
- Memory Management
- Eviction Strategy
- Crash Recovery
- Atomicity Guarantees
- Optimization Features
- Statistics and Monitoring
- Error Handling
- API Interface
- Background Tasks
- Example Usage
- Testing Considerations
Overview
A minimal, memory/IO-efficient, thread-safe, atomic, disk-backed cache library using only the Go standard library.
Core Design Principles
- Thread-Safe: All operations safe for concurrent access
- Atomicity: Cache state remains consistent across crashes/interruptions
- Disk-Backed: Items persist beyond process lifetime
- Memory Efficient: Minimal in-memory overhead, configurable limits
- IO Efficient: Batched/serialized disk operations, minimal seeks
- Standard Library Only: No external dependencies
- Flexible: Configurable eviction, serialization, and storage strategies
Data Structures
// Core cache entry
type CacheItem struct {
Key string
Value []byte
Expires time.Time
Created time.Time
Modified time.Time
AccessCount uint64
}
// In-memory index (minimal metadata)
type ItemMetadata struct {
FileOffset int64 // Position in data file
Size int64 // Size in bytes
Expires time.Time
LastAccessed time.Time
AccessCount uint64
}
// Main cache structure
type DiskCache struct {
mu sync.RWMutex
config Config
index map[string]ItemMetadata // In-memory lookup
dataFile *os.File // Append-only data log
indexFile *os.File // Persistent index
freeList []DataSegment // Reusable disk space tracking
stats CacheStats
evictionQueue *EvictionQueue // For policy-based removal
closed bool
}
Configuration
type Config struct {
// Storage paths
DataDir string
// Size limits
MaxDiskSize int64 // Maximum disk usage in bytes
MaxMemoryItems int // Maximum in-memory items (0 = unlimited)
// Eviction policy
EvictionPolicy EvictionPolicy
// Expiration
DefaultTTL time.Duration
CleanupInterval time.Duration
// Performance
WriteBufferSize int // Buffer size for batched writes
SyncOnWrite bool // Sync to disk on each write
// Compression
EnableCompression bool
CompressionLevel int
}
type EvictionPolicy int
const (
LRU EvictionPolicy = iota // Least Recently Used
LFU // Least Frequently Used
FIFO // First In First Out
SizeBased // Largest items first
)
File Structure on Disk
** Data File (append-only log)
- Format: Sequence of serialized CacheItem entries
- Name:
data.log - Structure: | magicnumber(uint32) | itemsize(uint64) | keysize(uint16) | key | value | checksum(uint32) | timestamp(int64) | expires(int64) |
** Index File (memory-mapped or regular file)
- Format: Fixed-size records for fast lookups
- Name:
index.idx - Structure: | keyhash(uint64) | fileoffset(int64) | size(int64) | expires(int64) | lastaccessed(int64) | flags(uint32) |
** Metadata File (JSON/TLV)
- Format: Cache configuration and statistics
- Name:
meta.json - Content: Config, total items, total size, creation date, last compaction
Thread Safety Guarantees
- Fine-grained locking: read/write locks where appropriate
- Atomic file operations via write-ahead logging
- Batch operations with transaction semantics
- No stale reads via version checking
Core Operations
** Get(key string) ([]byte, error)
- Acquire read lock
- Check in-memory index
- If expired → schedule deletion, return error
- Seek to data file offset
- Read and validate checksum
- Update access metadata (with write lock upgrade)
- Return data
** Set(key string, value []byte, ttl time.Duration) error
- Serialize to temporary buffer
- Acquire write lock
- Check capacity, evict if necessary
- Append to data file (with checksum)
- Update in-memory index
- Async index persistence
- Release lock
** Delete(key string) error
- Acquire write lock
- Mark index entry as deleted
- Add space to free list
- Async data file compaction trigger
- Update stats
** Exists(key string) bool
- Read lock
- Check index and expiration
- Return status
Memory Management
- Lazy loading: Only metadata in memory, values on disk
- Slab allocation for same-size items
- Object pooling for buffers
- Configurable maximum memory usage
Eviction Strategy
- Background goroutine for periodic cleanup
- Policy-based selection for removal
- Batch deletion to minimize IO
- Size-aware eviction to free sufficient space
Crash Recovery
- Write-ahead logging for atomic operations
- Checksum validation on read
- Index reconstruction from data log
- Transaction journal for multi-step operations
- Recovery on Open(): verify and rebuild if corrupted
Atomicity Guarantees
- Two-phase commit for batch operations
- Journaling before data modification
- fsync() control via configuration
- Power-fail safe through ordered writes
Optimization Features
- Bloom filter for quick non-existence checks
- Read-ahead buffering for sequential access
- Write coalescing for concurrent writes
- Cache warming on startup
- Background compaction of data file
Statistics and Monitoring
type CacheStats struct {
Hits uint64
Misses uint64
Evictions uint64
DiskUsage int64
MemoryUsage int64
AvgReadTime time.Duration
AvgWriteTime time.Duration
CorruptionErrors uint64
}
Error Handling
- Comprehensive error types
- Graceful degradation
- Automatic corruption recovery
- Resource exhaustion handling
API Interface
type Cache interface {
Get(key string) ([]byte, error)
Set(key string, value []byte) error
SetWithTTL(key string, value []byte, ttl time.Duration) error
Delete(key string) error
Exists(key string) bool
Clear() error
Stats() CacheStats
Close() error
Flush() error // Force sync to disk
}
// Batch operations
type BatchOperation struct {
Op OperationType
Key string
Value []byte
TTL time.Duration
}
func (dc *DiskCache) ExecuteBatch(ops []BatchOperation) error
Background Tasks
- Expiration scanner (configurable interval)
- Data file compaction
- Index persistence
- Statistics aggregation
Example Usage
config := Config{
DataDir: "/var/cache/myapp",
MaxDiskSize: 1024 * 1024 * 100, // 100MB
EvictionPolicy: LRU,
DefaultTTL: time.Hour,
}
cache, err := Open(config)
defer cache.Close()
err = cache.Set("user:123", []byte("data"), time.Minute*30)
data, err := cache.Get("user:123")
Testing Considerations
- Concurrency stress tests
- Crash recovery simulations
- Disk full scenarios
- Corruption injection
- Performance benchmarks