Kotodama is a universal language toolchain framework written in Go that enables creating new programming languages, transpilers, and LSP tooling within hours by defining them in YAML, featuring a unified AST format with boolean-optimized tokens and supporting multiple operational modes including framework embedding, code generation, and source-to-source translation.
- Go 98.8%
- Makefile 1.2%
| ast | ||
| bytecode | ||
| cmd/kotodama | ||
| generator | ||
| grammar | ||
| interpreter | ||
| jsonparser | ||
| lexer | ||
| lsp | ||
| parser | ||
| token | ||
| transform | ||
| transpile | ||
| vendor | ||
| .gitignore | ||
| go.mod | ||
| go.sum | ||
| GUIDELINES.md | ||
| LICENSE | ||
| Makefile | ||
| README.md | ||
| ROADMAP.md | ||
Kotodama Language Toolkit
A universal language toolchain framework written in Go that enables:
- Multi-modal operation (framework, code generator, transpiler)
- Polyglot execution through YAML/JSON language definitions
- Unified AST format with boolean token optimization
- Complete compiler/interpreter/LSP tooling ecosystem
- 100% Free and Open Source
Kotodama (言霊) means "word spirit" in Japanese – the belief that words have power to shape reality.
🚀 Current Status
Phase 1: Foundation is ~85% complete with all core components implemented and tested.
✅ Completed Features
Core Infrastructure
- Universal Tokenizer: Boolean flag token design with fast bitmask operations
- Lexer Core Engine: Streaming lexer with error recovery and lookahead
- DFA/NFA Pattern Matching: Regex-to-DFA compiler for YAML patterns
- Unified AST Format: BSON-serializable nodes with source positions
- Grammar Definition System: YAML loader with EBNF/BNF support
- YAML Grammar Definition: Complete YAML grammar definition (
grammar/yaml.yaml) for parsing YAML with Kotodama itself (meta-circular evaluation)
- YAML Grammar Definition: Complete YAML grammar definition (
- AST Transformation Engine: Pattern matching and rewrite rules for AST manipulation
- Pattern Matching: Type-based, data field, and child structure matching
- Rewrite Rules: Priority-based rule application with capture groups
- Visitor Integration: Works with existing visitor pattern for custom traversals
Execution Engines
- Tree-Walk Interpreter: Expression evaluator with native function binding
- Bytecode VM: Stack-based bytecode with debug information
- Go Code Generator: Generate lexer/parser from YAML definitions
Practical Tools
- JSON↔YAML Transpiler: Complete JSON parser using Kotodama's lexer/AST
- Enhanced CLI Tool: Subcommand architecture with:
repl: Interactive REPLrun <file>: Execute Kotodama scripts (with shebang support)ast <file>: Output AST in Graphviz DOT formatjson2yaml/yaml2json: Data format conversionlsp: Start Language Server Protocol serverversion: Version information
LSP Server Framework
- Language-agnostic LSP core: Works with any Kotodama language definition
- Text document synchronization: Full incremental sync support
- Diagnostics: Real-time error reporting using lexer/parser errors
- Extensible architecture: Easy to add language-specific features
- JSON support: Full diagnostics for JSON files
🔄 In Progress / Next Steps
- Complete Go code generator (AST type generation, visitor pattern)
- YAML grammar definition (parsing YAML with Kotodama itself) ✓
- AST transformation engine with pattern matching ✓
- LSP server framework ✓
- Common language definitions (TOML, INI, CSV)
🏗️ Architecture
Core Components
kotodama/
├── token/ # Universal tokenizer (boolean flags)
├── lexer/ # Lexer generator (YAML → DFA)
├── parser/ # Parser generator (YAML → parser)
├── ast/ # Unified AST (BSON serializable)
├── transform/ # AST pattern matching and transformation
├── interpreter/ # Tree-walk interpreter
├── bytecode/ # IR/bytecode format
├── generator/ # Code generator (YAML → Go)
├── transpile/ # Source-to-source translation
├── grammar/ # Grammar definition loader
├── lsp/ # Language Server Protocol server
└── cmd/kotodama/ # CLI tool
AST Transformation Engine
The transform package provides powerful pattern matching and rewrite capabilities for AST manipulation:
import "git.lan.thwap.org/thwap/kotodama/transform"
factory := &ast.NodeFactory{}
transformer := transform.NewTransformer(factory)
// Add a rule to convert string values to uppercase
transformer.AddRule(&transform.Rule{
Name: "uppercase_strings",
Pattern: transform.PatternOfType("String"),
Transform: func(match *transform.MatchResult) *ast.Node {
val, _ := match.Node.Data["value"].(string)
upper := strings.ToUpper(val)
return factory.CreateNode("String", map[string]interface{}{
"value": upper,
}, nil)
},
Priority: 10,
})
// Apply transformations (bottom-up by default)
transformedAST := transformer.Transform(originalAST)
Features:
- Pattern Matching: Match nodes by type, data fields, and child structure
- Capture Groups: Extract matched nodes for use in transformations
- Rule Priorities: Control application order with priority levels
- Multiple Traversal Strategies: Bottom-up, top-down, and visitor-based
- Wildcard Support: Regex patterns for node types and values
Three Operational Modes
- Framework Mode: Library for embedding in Go applications
- Code Generator Mode: YACC 2.0 – generates code in target languages
- Transpiler Mode: Language-to-language translation
Key Innovations
Boolean Flag Token Design
type Token struct {
Pos Position // 8 bytes: offset, line, column
Span int16 // Token length in runes
Literal []byte // Token text (slice into source)
Type TokenType // 2 bytes: category + subtype
}
// Fast boolean checks via bitmask operations
func (t Token) IsKeyword() bool { return t.Type & CategoryMask == TkKeyword }
func (t Token) IsIdentifier() bool { return t.Type & CategoryMask == TkIdentifier }
Unified AST Format (BSON Serializable)
type Node struct {
ID string `bson:"id"` // UUID for referencing
Type NodeType `bson:"type"` // Node type
Span Span `bson:"span"` // Source location
Data map[string]interface{} `bson:"data"` // Node-specific data
Children []*Node `bson:"children,omitempty"`
Metadata map[string]interface{} `bson:"meta,omitempty"`
}
🚀 Getting Started
Prerequisites
- Go 1.21+ (currently using 1.24.11)
- Graphviz (optional, for AST visualization)
Installation
# Clone the repository
git clone https://git.lan.thwap.org/thwap/kotodama.git
cd kotodama
# Build the CLI tool
make build
# or
go build ./cmd/kotodama
# Run tests
make test
Quick Examples
Interactive REPL
./kotodama repl
> 1 + 2 * 3
7
> print("Hello, " + "world!")
Hello, world!
Execute a Script
#!/usr/bin/env kotodama run
# test.ktdm
print("1 + 2 * 3 =", 1 + 2 * 3)
print("Length of 'hello':", len("hello"))
./kotodama run test.ktdm
# Output:
# 1 + 2 * 3 = 7
# Length of 'hello': 5
JSON ↔ YAML Transpilation
# JSON to YAML
echo '{"name": "John", "age": 30}' | ./kotodama json2yaml
# Output:
# age: 30
# name: John
# YAML to JSON
echo -e 'name: John\nage: 30' | ./kotodama yaml2json
# Output:
# {
# "age": 30,
# "name": "John"
# }
AST Visualization
# Create a simple script
echo 'print(1 + 2 * 3)' > example.ktdm
# Generate Graphviz DOT output
./kotodama ast example.ktdm > ast.dot
# Convert to image (requires Graphviz)
dot -Tpng ast.dot -o ast.png
📚 Language Definitions
JSON Grammar
Complete JSON parser implementation in grammar/json.yaml:
- Objects, arrays, strings, numbers, booleans, null
- Uses Kotodama's lexer and AST
- Integrated into JSON↔YAML transpiler
Creating New Languages
Define languages in YAML format:
# example.yaml
name: "Example Language"
version: "1.0"
lexer:
tokens:
identifier: "[a-zA-Z_][a-zA-Z0-9_]*"
number: "[0-9]+"
keywords:
func: TkFunc
var: TkVar
operators:
"+": TkAdd
"-": TkSub
grammar:
start: "Program"
rules:
Program: "Statement*"
Statement: "Expression ;"
Expression: "Term (('+' | '-') Term)*"
🛠️ API Usage
Embedding in Go Applications
package main
import (
"fmt"
"git.lan.thwap.org/thwap/kotodama/ast"
"git.lan.thwap.org/thwap/kotodama/interpreter"
"git.lan.thwap.org/thwap/kotodama/lexer"
"git.lan.thwap.org/thwap/kotodama/parser"
)
func main() {
src := []byte("1 + 2 * 3")
l := lexer.NewFromBytes(src)
factory := &ast.NodeFactory{}
p := parser.NewPrattParser(l, &parser.Config{Mode: parser.ModePratt}, factory)
node, err := p.Parse()
if err != nil {
panic(err)
}
env := interpreter.NewEnvironment()
eval := interpreter.NewEvaluator(env)
result, err := eval.Evaluate(node)
if err != nil {
panic(err)
}
fmt.Println(result) // Output: 7
}
Using the Code Generator
package main
import (
"fmt"
"git.lan.thwap.org/thwap/kotodama/generator"
"git.lan.thwap.org/thwap/kotodama/grammar"
"os"
)
func main() {
// Load grammar from YAML
data, _ := os.ReadFile("grammar/json.yaml")
g, _ := grammar.LoadFromYAML(data)
// Generate Go parser code
code, err := generator.GenerateRecursiveDescent(g, generator.Config{
PackageName: "jsonparser",
Visitor: true,
})
if err != nil {
panic(err)
}
fmt.Println(code)
}
🧪 Testing
# Run all tests
make test
# Run specific test suites
make test-lexer
make test-parser
make test-generator
make test-interpreter
# Run with verbose output
make test-verbose
# Generate coverage report
make coverage
# Open coverage.html in browser
📊 Performance
Tokenizer Targets
- < 100ns per token on average
- Zero allocations for common token paths
- < 50MB memory for 1MB source file
- Support for files > 1GB via streaming
Current Benchmarks
Run benchmarks with:
make bench
🗺️ Roadmap
See ROADMAP.md for detailed planning.
Immediate Goals (Next 1-2 weeks)
- Complete Go code generator with AST type generation
- YAML grammar definition (parsing YAML with Kotodama)
- AST transformation engine with pattern matching
Phase 2: Tooling Ecosystem (Months 4-6)
- LSP server framework
- Code formatter
- Debugger interface
- Plugin system
Phase 3: Language Expansion (Months 7-12)
- Shell implementations (bash, fish, PowerShell)
- Programming language subsets (Go, Python, JavaScript)
- SQL dialects (PostgreSQL, MySQL, SQLite)
👥 Contributing
Kotodama is a 100% free and open source project.
For Developers
- Go 1.21+ required
- Test-driven development required
- Performance benchmarks for critical paths
- Documentation for public APIs
For Language Designers
- Create YAML language definitions
- Write transpilation rules
- Add examples and documentation
For Users
- Bug reports with minimal reproduction
- Feature requests with use cases
- Documentation improvements
📄 License
Apache 2.0 for all code Creative Commons for documentation
📞 Contact & Resources
- Source Code:
git.lan.thwap.org/thwap/kotodama - Documentation: In this README and code comments
Kotodama – Simplify language tooling to the point where creating a new language takes hours, not months.