No description
  • Go 98.5%
  • Makefile 1.5%
Find a file
2026-02-02 09:31:54 -08:00
.gitignore chore: add aider cache files to gitignore 2026-02-02 09:31:54 -08:00
.pre-commit-config.yaml feat: add pre-commit hooks and Makefile for Go development 2026-01-30 09:31:31 -08:00
batch.go docs: add comprehensive GoDoc comments and unit tests 2026-01-30 08:11:37 -08:00
cache.go docs: add comprehensive GoDoc comments and unit tests 2026-01-30 08:11:37 -08:00
config.go docs: add comprehensive GoDoc comments and unit tests 2026-01-30 08:11:37 -08:00
disk_cache.go docs: add comprehensive GoDoc comments and unit tests 2026-01-30 08:11:37 -08:00
eviction.go docs: add comprehensive GoDoc comments and unit tests 2026-01-30 08:11:37 -08:00
example_test.go docs: add comprehensive GoDoc comments and unit tests 2026-01-30 08:11:37 -08:00
go.mod chore: initialize go module with version 1.24.11 2026-01-30 06:25:35 -08:00
Makefile feat: add pre-commit hooks and Makefile for Go development 2026-01-30 09:31:31 -08:00
operations.go feat: store key length in data file and improve close synchronization 2026-01-30 07:31:03 -08:00
README.md docs: add comprehensive README for disk cache library 2026-01-30 06:17:17 -08:00
recovery.go refactor: remove unused imports and add time import to batch 2026-01-30 06:48:16 -08:00

Table of Contents

  1. Overview
  2. Core Design Principles
  3. Data Structures
  4. Configuration
  5. File Structure on Disk
  6. Thread Safety Guarantees
  7. Core Operations
  8. Memory Management
  9. Eviction Strategy
  10. Crash Recovery
  11. Atomicity Guarantees
  12. Optimization Features
  13. Statistics and Monitoring
  14. Error Handling
  15. API Interface
  16. Background Tasks
  17. Example Usage
  18. Testing Considerations

Overview

A minimal, memory/IO-efficient, thread-safe, atomic, disk-backed cache library using only the Go standard library.

Core Design Principles

  • Thread-Safe: All operations safe for concurrent access
  • Atomicity: Cache state remains consistent across crashes/interruptions
  • Disk-Backed: Items persist beyond process lifetime
  • Memory Efficient: Minimal in-memory overhead, configurable limits
  • IO Efficient: Batched/serialized disk operations, minimal seeks
  • Standard Library Only: No external dependencies
  • Flexible: Configurable eviction, serialization, and storage strategies

Data Structures

// Core cache entry
type CacheItem struct {
    Key      string
    Value    []byte
    Expires  time.Time
    Created  time.Time
    Modified time.Time
    AccessCount uint64
}

// In-memory index (minimal metadata)
type ItemMetadata struct {
    FileOffset    int64    // Position in data file
    Size          int64    // Size in bytes
    Expires       time.Time
    LastAccessed  time.Time
    AccessCount   uint64
}

// Main cache structure
type DiskCache struct {
    mu            sync.RWMutex
    config        Config
    index         map[string]ItemMetadata  // In-memory lookup
    dataFile      *os.File                 // Append-only data log
    indexFile     *os.File                 // Persistent index
    freeList      []DataSegment            // Reusable disk space tracking
    stats         CacheStats
    evictionQueue *EvictionQueue           // For policy-based removal
    closed        bool
}

Configuration

type Config struct {
    // Storage paths
    DataDir      string

    // Size limits
    MaxDiskSize  int64     // Maximum disk usage in bytes
    MaxMemoryItems int     // Maximum in-memory items (0 = unlimited)

    // Eviction policy
    EvictionPolicy EvictionPolicy

    // Expiration
    DefaultTTL   time.Duration
    CleanupInterval time.Duration

    // Performance
    WriteBufferSize int    // Buffer size for batched writes
    SyncOnWrite    bool    // Sync to disk on each write

    // Compression
    EnableCompression bool
    CompressionLevel  int
}

type EvictionPolicy int
const (
    LRU EvictionPolicy = iota    // Least Recently Used
    LFU                          // Least Frequently Used
    FIFO                         // First In First Out
    SizeBased                    // Largest items first
)

File Structure on Disk

** Data File (append-only log)

  • Format: Sequence of serialized CacheItem entries
  • Name: data.log
  • Structure: | magicnumber(uint32) | itemsize(uint64) | keysize(uint16) | key | value | checksum(uint32) | timestamp(int64) | expires(int64) |

** Index File (memory-mapped or regular file)

  • Format: Fixed-size records for fast lookups
  • Name: index.idx
  • Structure: | keyhash(uint64) | fileoffset(int64) | size(int64) | expires(int64) | lastaccessed(int64) | flags(uint32) |

** Metadata File (JSON/TLV)

  • Format: Cache configuration and statistics
  • Name: meta.json
  • Content: Config, total items, total size, creation date, last compaction

Thread Safety Guarantees

  • Fine-grained locking: read/write locks where appropriate
  • Atomic file operations via write-ahead logging
  • Batch operations with transaction semantics
  • No stale reads via version checking

Core Operations

** Get(key string) ([]byte, error)

  1. Acquire read lock
  2. Check in-memory index
  3. If expired → schedule deletion, return error
  4. Seek to data file offset
  5. Read and validate checksum
  6. Update access metadata (with write lock upgrade)
  7. Return data

** Set(key string, value []byte, ttl time.Duration) error

  1. Serialize to temporary buffer
  2. Acquire write lock
  3. Check capacity, evict if necessary
  4. Append to data file (with checksum)
  5. Update in-memory index
  6. Async index persistence
  7. Release lock

** Delete(key string) error

  1. Acquire write lock
  2. Mark index entry as deleted
  3. Add space to free list
  4. Async data file compaction trigger
  5. Update stats

** Exists(key string) bool

  1. Read lock
  2. Check index and expiration
  3. Return status

Memory Management

  • Lazy loading: Only metadata in memory, values on disk
  • Slab allocation for same-size items
  • Object pooling for buffers
  • Configurable maximum memory usage

Eviction Strategy

  • Background goroutine for periodic cleanup
  • Policy-based selection for removal
  • Batch deletion to minimize IO
  • Size-aware eviction to free sufficient space

Crash Recovery

  • Write-ahead logging for atomic operations
  • Checksum validation on read
  • Index reconstruction from data log
  • Transaction journal for multi-step operations
  • Recovery on Open(): verify and rebuild if corrupted

Atomicity Guarantees

  • Two-phase commit for batch operations
  • Journaling before data modification
  • fsync() control via configuration
  • Power-fail safe through ordered writes

Optimization Features

  • Bloom filter for quick non-existence checks
  • Read-ahead buffering for sequential access
  • Write coalescing for concurrent writes
  • Cache warming on startup
  • Background compaction of data file

Statistics and Monitoring

type CacheStats struct {
    Hits          uint64
    Misses        uint64
    Evictions     uint64
    DiskUsage     int64
    MemoryUsage   int64
    AvgReadTime   time.Duration
    AvgWriteTime  time.Duration
    CorruptionErrors uint64
}

Error Handling

  • Comprehensive error types
  • Graceful degradation
  • Automatic corruption recovery
  • Resource exhaustion handling

API Interface

type Cache interface {
    Get(key string) ([]byte, error)
    Set(key string, value []byte) error
    SetWithTTL(key string, value []byte, ttl time.Duration) error
    Delete(key string) error
    Exists(key string) bool
    Clear() error
    Stats() CacheStats
    Close() error
    Flush() error  // Force sync to disk
}

// Batch operations
type BatchOperation struct {
    Op    OperationType
    Key   string
    Value []byte
    TTL   time.Duration
}

func (dc *DiskCache) ExecuteBatch(ops []BatchOperation) error

Background Tasks

  • Expiration scanner (configurable interval)
  • Data file compaction
  • Index persistence
  • Statistics aggregation

Example Usage

config := Config{
    DataDir:        "/var/cache/myapp",
    MaxDiskSize:    1024 * 1024 * 100, // 100MB
    EvictionPolicy: LRU,
    DefaultTTL:     time.Hour,
}

cache, err := Open(config)
defer cache.Close()

err = cache.Set("user:123", []byte("data"), time.Minute*30)
data, err := cache.Get("user:123")

Testing Considerations

  • Concurrency stress tests
  • Crash recovery simulations
  • Disk full scenarios
  • Corruption injection
  • Performance benchmarks