thwap/kotodama

Fork 0

Kotodama is a universal language toolchain framework written in Go that enables creating new programming languages, transpilers, and LSP tooling within hours by defining them in YAML, featuring a unified AST format with boolean-optimized tokens and supporting multiple operational modes including framework embedding, code generation, and source-to-source translation.

Go 98.8%
Makefile 1.2%

Find a file

Mike 'Fuzzy' Partin a08499645a refactor: improve backtracking logic in parser with lexer state management		2026-03-23 03:29:30 -07:00
ast	feat: add extended universal node types for AST support	2026-03-22 17:33:37 -07:00
bytecode	feat: implement bytecode virtual machine with stack operations and arithmetic	2026-03-22 09:05:52 -07:00
cmd/kotodama	feat: add lsp command and server implementation	2026-03-22 13:34:22 -07:00
generator	feat: add comprehensive node type definitions and Lua action mappings	2026-03-22 17:33:43 -07:00
grammar	feat: add PrefixExpr rule to PrimaryExpr grammar	2026-03-23 03:29:18 -07:00
interpreter	feat: implement operator precedence climbing and update type function to typeof	2026-03-22 09:05:32 -07:00
jsonparser	feat: implement json parser and converter with ast integration	2026-03-22 10:21:50 -07:00
lexer	feat: add backtracking support with SaveState/RestoreState for lexers	2026-03-23 03:29:00 -07:00
lsp	feat: implement language server protocol completion support	2026-03-22 13:34:59 -07:00
parser	refactor: improve backtracking logic in parser with lexer state management	2026-03-23 03:29:30 -07:00
token	feat: implement token registry and update operator precedence	2026-03-22 17:34:54 -07:00
transform	feat: add JSON transformation engine with YAML rule support	2026-03-22 13:34:37 -07:00
transpile	feat: add JSON↔YAML transpilation functionality with Kotodama parser	2026-03-22 10:22:36 -07:00
vendor	fix: update go build constraints and formatting in vendor libraries	2026-03-22 17:32:08 -07:00
.gitignore	chore: update gitignore to exclude binaries, test output, generated files, editor artifacts, and OS-specific files	2026-03-22 11:19:55 -07:00
go.mod	feat: upgrade go version and add tls certificate utilities	2026-03-22 13:31:28 -07:00
go.sum	feat: upgrade go version and add tls certificate utilities	2026-03-22 13:31:28 -07:00
GUIDELINES.md	docs: generalize project guidelines to be less language-agnostic	2026-02-06 01:02:12 -08:00
LICENSE	feat: add apache 2.0 license and comprehensive documentation	2026-03-22 11:21:50 -07:00
Makefile	feat: add Makefile for build automation and testing	2026-03-22 04:57:27 -07:00
README.md	docs: update FOSS licensing language in README	2026-03-22 17:33:10 -07:00
ROADMAP.md	feat: add AST transformation engine and LSP server framework	2026-03-22 13:32:51 -07:00

README.md

Kotodama Language Toolkit

A universal language toolchain framework written in Go that enables:

Multi-modal operation (framework, code generator, transpiler)
Polyglot execution through YAML/JSON language definitions
Unified AST format with boolean token optimization
Complete compiler/interpreter/LSP tooling ecosystem
100% Free and Open Source

Kotodama (言霊) means "word spirit" in Japanese – the belief that words have power to shape reality.

🚀 Current Status

Phase 1: Foundation is ~85% complete with all core components implemented and tested.

✅ Completed Features

Core Infrastructure

Universal Tokenizer: Boolean flag token design with fast bitmask operations
Lexer Core Engine: Streaming lexer with error recovery and lookahead
DFA/NFA Pattern Matching: Regex-to-DFA compiler for YAML patterns
Unified AST Format: BSON-serializable nodes with source positions
Grammar Definition System: YAML loader with EBNF/BNF support
- YAML Grammar Definition: Complete YAML grammar definition (grammar/yaml.yaml) for parsing YAML with Kotodama itself (meta-circular evaluation)
AST Transformation Engine: Pattern matching and rewrite rules for AST manipulation
- Pattern Matching: Type-based, data field, and child structure matching
- Rewrite Rules: Priority-based rule application with capture groups
- Visitor Integration: Works with existing visitor pattern for custom traversals

Execution Engines

Tree-Walk Interpreter: Expression evaluator with native function binding
Bytecode VM: Stack-based bytecode with debug information
Go Code Generator: Generate lexer/parser from YAML definitions

Practical Tools

JSON↔YAML Transpiler: Complete JSON parser using Kotodama's lexer/AST
Enhanced CLI Tool: Subcommand architecture with:
- repl: Interactive REPL
- run <file>: Execute Kotodama scripts (with shebang support)
- ast <file>: Output AST in Graphviz DOT format
- json2yaml / yaml2json: Data format conversion
- lsp: Start Language Server Protocol server
- version: Version information

LSP Server Framework

Language-agnostic LSP core: Works with any Kotodama language definition
Text document synchronization: Full incremental sync support
Diagnostics: Real-time error reporting using lexer/parser errors
Extensible architecture: Easy to add language-specific features
JSON support: Full diagnostics for JSON files

🔄 In Progress / Next Steps

Complete Go code generator (AST type generation, visitor pattern)
YAML grammar definition (parsing YAML with Kotodama itself) ✓
AST transformation engine with pattern matching ✓
LSP server framework ✓
Common language definitions (TOML, INI, CSV)

🏗️ Architecture

Core Components

kotodama/
├── token/          # Universal tokenizer (boolean flags)
├── lexer/         # Lexer generator (YAML → DFA)
├── parser/        # Parser generator (YAML → parser)
├── ast/           # Unified AST (BSON serializable)
├── transform/     # AST pattern matching and transformation
├── interpreter/   # Tree-walk interpreter
├── bytecode/      # IR/bytecode format
├── generator/     # Code generator (YAML → Go)
├── transpile/     # Source-to-source translation
├── grammar/       # Grammar definition loader
├── lsp/           # Language Server Protocol server
└── cmd/kotodama/  # CLI tool

AST Transformation Engine

The transform package provides powerful pattern matching and rewrite capabilities for AST manipulation:

import "git.lan.thwap.org/thwap/kotodama/transform"

factory := &ast.NodeFactory{}
transformer := transform.NewTransformer(factory)

// Add a rule to convert string values to uppercase
transformer.AddRule(&transform.Rule{
    Name: "uppercase_strings",
    Pattern: transform.PatternOfType("String"),
    Transform: func(match *transform.MatchResult) *ast.Node {
        val, _ := match.Node.Data["value"].(string)
        upper := strings.ToUpper(val)
        return factory.CreateNode("String", map[string]interface{}{
            "value": upper,
        }, nil)
    },
    Priority: 10,
})

// Apply transformations (bottom-up by default)
transformedAST := transformer.Transform(originalAST)

Features:

Pattern Matching: Match nodes by type, data fields, and child structure
Capture Groups: Extract matched nodes for use in transformations
Rule Priorities: Control application order with priority levels
Multiple Traversal Strategies: Bottom-up, top-down, and visitor-based
Wildcard Support: Regex patterns for node types and values

Three Operational Modes

Framework Mode: Library for embedding in Go applications
Code Generator Mode: YACC 2.0 – generates code in target languages
Transpiler Mode: Language-to-language translation

Key Innovations

Boolean Flag Token Design

type Token struct {
    Pos     Position  // 8 bytes: offset, line, column
    Span    int16     // Token length in runes
    Literal []byte    // Token text (slice into source)
    Type    TokenType // 2 bytes: category + subtype
}

// Fast boolean checks via bitmask operations
func (t Token) IsKeyword() bool { return t.Type & CategoryMask == TkKeyword }
func (t Token) IsIdentifier() bool { return t.Type & CategoryMask == TkIdentifier }

Unified AST Format (BSON Serializable)

type Node struct {
    ID       string                 `bson:"id"`      // UUID for referencing
    Type     NodeType               `bson:"type"`    // Node type
    Span     Span                   `bson:"span"`    // Source location
    Data     map[string]interface{} `bson:"data"`    // Node-specific data
    Children []*Node                `bson:"children,omitempty"`
    Metadata map[string]interface{} `bson:"meta,omitempty"`
}

🚀 Getting Started

Prerequisites

Go 1.21+ (currently using 1.24.11)
Graphviz (optional, for AST visualization)

Installation

# Clone the repository
git clone https://git.lan.thwap.org/thwap/kotodama.git
cd kotodama

# Build the CLI tool
make build
# or
go build ./cmd/kotodama

# Run tests
make test

Quick Examples

Interactive REPL

./kotodama repl
> 1 + 2 * 3
7
> print("Hello, " + "world!")
Hello, world!

Execute a Script

#!/usr/bin/env kotodama run
# test.ktdm
print("1 + 2 * 3 =", 1 + 2 * 3)
print("Length of 'hello':", len("hello"))

./kotodama run test.ktdm
# Output:
# 1 + 2 * 3 = 7
# Length of 'hello': 5

JSON ↔ YAML Transpilation

# JSON to YAML
echo '{"name": "John", "age": 30}' | ./kotodama json2yaml
# Output:
# age: 30
# name: John

# YAML to JSON
echo -e 'name: John\nage: 30' | ./kotodama yaml2json
# Output:
# {
#   "age": 30,
#   "name": "John"
# }

AST Visualization

# Create a simple script
echo 'print(1 + 2 * 3)' > example.ktdm

# Generate Graphviz DOT output
./kotodama ast example.ktdm > ast.dot

# Convert to image (requires Graphviz)
dot -Tpng ast.dot -o ast.png

📚 Language Definitions

JSON Grammar

Complete JSON parser implementation in grammar/json.yaml:

Objects, arrays, strings, numbers, booleans, null
Uses Kotodama's lexer and AST
Integrated into JSON↔YAML transpiler

Creating New Languages

Define languages in YAML format:

# example.yaml
name: "Example Language"
version: "1.0"

lexer:
  tokens:
    identifier: "[a-zA-Z_][a-zA-Z0-9_]*"
    number: "[0-9]+"
  
  keywords:
    func: TkFunc
    var: TkVar
  
  operators:
    "+": TkAdd
    "-": TkSub

grammar:
  start: "Program"
  rules:
    Program: "Statement*"
    Statement: "Expression ;"
    Expression: "Term (('+' | '-') Term)*"

🛠️ API Usage

Embedding in Go Applications

package main

import (
    "fmt"
    "git.lan.thwap.org/thwap/kotodama/ast"
    "git.lan.thwap.org/thwap/kotodama/interpreter"
    "git.lan.thwap.org/thwap/kotodama/lexer"
    "git.lan.thwap.org/thwap/kotodama/parser"
)

func main() {
    src := []byte("1 + 2 * 3")
    l := lexer.NewFromBytes(src)
    factory := &ast.NodeFactory{}
    p := parser.NewPrattParser(l, &parser.Config{Mode: parser.ModePratt}, factory)
    node, err := p.Parse()
    if err != nil {
        panic(err)
    }
    
    env := interpreter.NewEnvironment()
    eval := interpreter.NewEvaluator(env)
    result, err := eval.Evaluate(node)
    if err != nil {
        panic(err)
    }
    fmt.Println(result) // Output: 7
}

Using the Code Generator

package main

import (
    "fmt"
    "git.lan.thwap.org/thwap/kotodama/generator"
    "git.lan.thwap.org/thwap/kotodama/grammar"
    "os"
)

func main() {
    // Load grammar from YAML
    data, _ := os.ReadFile("grammar/json.yaml")
    g, _ := grammar.LoadFromYAML(data)
    
    // Generate Go parser code
    code, err := generator.GenerateRecursiveDescent(g, generator.Config{
        PackageName: "jsonparser",
        Visitor:     true,
    })
    if err != nil {
        panic(err)
    }
    
    fmt.Println(code)
}

🧪 Testing

# Run all tests
make test

# Run specific test suites
make test-lexer
make test-parser
make test-generator
make test-interpreter

# Run with verbose output
make test-verbose

# Generate coverage report
make coverage
# Open coverage.html in browser

📊 Performance

Tokenizer Targets

< 100ns per token on average
Zero allocations for common token paths
< 50MB memory for 1MB source file
Support for files > 1GB via streaming

Current Benchmarks

Run benchmarks with:

make bench

🗺️ Roadmap

See ROADMAP.md for detailed planning.

Immediate Goals (Next 1-2 weeks)

Complete Go code generator with AST type generation
YAML grammar definition (parsing YAML with Kotodama)
AST transformation engine with pattern matching

Phase 2: Tooling Ecosystem (Months 4-6)

LSP server framework
Code formatter
Debugger interface
Plugin system

Phase 3: Language Expansion (Months 7-12)

Shell implementations (bash, fish, PowerShell)
Programming language subsets (Go, Python, JavaScript)
SQL dialects (PostgreSQL, MySQL, SQLite)

👥 Contributing

Kotodama is a 100% free and open source project.

For Developers

Go 1.21+ required
Test-driven development required
Performance benchmarks for critical paths
Documentation for public APIs

For Language Designers

Create YAML language definitions
Write transpilation rules
Add examples and documentation

For Users

Bug reports with minimal reproduction
Feature requests with use cases
Documentation improvements

📄 License

Apache 2.0 for all code Creative Commons for documentation

📞 Contact & Resources

Source Code: git.lan.thwap.org/thwap/kotodama
Documentation: In this README and code comments

Kotodama – Simplify language tooling to the point where creating a new language takes hours, not months.

README.md Unescape Escape