Kotodama is a universal language toolchain framework written in Go that enables creating new programming languages, transpilers, and LSP tooling within hours by defining them in YAML, featuring a unified AST format with boolean-optimized tokens and supporting multiple operational modes including framework embedding, code generation, and source-to-source translation.
  • Go 98.8%
  • Makefile 1.2%
Find a file
2026-03-23 03:29:30 -07:00
ast feat: add extended universal node types for AST support 2026-03-22 17:33:37 -07:00
bytecode feat: implement bytecode virtual machine with stack operations and arithmetic 2026-03-22 09:05:52 -07:00
cmd/kotodama feat: add lsp command and server implementation 2026-03-22 13:34:22 -07:00
generator feat: add comprehensive node type definitions and Lua action mappings 2026-03-22 17:33:43 -07:00
grammar feat: add PrefixExpr rule to PrimaryExpr grammar 2026-03-23 03:29:18 -07:00
interpreter feat: implement operator precedence climbing and update type function to typeof 2026-03-22 09:05:32 -07:00
jsonparser feat: implement json parser and converter with ast integration 2026-03-22 10:21:50 -07:00
lexer feat: add backtracking support with SaveState/RestoreState for lexers 2026-03-23 03:29:00 -07:00
lsp feat: implement language server protocol completion support 2026-03-22 13:34:59 -07:00
parser refactor: improve backtracking logic in parser with lexer state management 2026-03-23 03:29:30 -07:00
token feat: implement token registry and update operator precedence 2026-03-22 17:34:54 -07:00
transform feat: add JSON transformation engine with YAML rule support 2026-03-22 13:34:37 -07:00
transpile feat: add JSON↔YAML transpilation functionality with Kotodama parser 2026-03-22 10:22:36 -07:00
vendor fix: update go build constraints and formatting in vendor libraries 2026-03-22 17:32:08 -07:00
.gitignore chore: update gitignore to exclude binaries, test output, generated files, editor artifacts, and OS-specific files 2026-03-22 11:19:55 -07:00
go.mod feat: upgrade go version and add tls certificate utilities 2026-03-22 13:31:28 -07:00
go.sum feat: upgrade go version and add tls certificate utilities 2026-03-22 13:31:28 -07:00
GUIDELINES.md docs: generalize project guidelines to be less language-agnostic 2026-02-06 01:02:12 -08:00
LICENSE feat: add apache 2.0 license and comprehensive documentation 2026-03-22 11:21:50 -07:00
Makefile feat: add Makefile for build automation and testing 2026-03-22 04:57:27 -07:00
README.md docs: update FOSS licensing language in README 2026-03-22 17:33:10 -07:00
ROADMAP.md feat: add AST transformation engine and LSP server framework 2026-03-22 13:32:51 -07:00

Kotodama Language Toolkit

Go Report Card License

A universal language toolchain framework written in Go that enables:

  • Multi-modal operation (framework, code generator, transpiler)
  • Polyglot execution through YAML/JSON language definitions
  • Unified AST format with boolean token optimization
  • Complete compiler/interpreter/LSP tooling ecosystem
  • 100% Free and Open Source

Kotodama (言霊) means "word spirit" in Japanese the belief that words have power to shape reality.

🚀 Current Status

Phase 1: Foundation is ~85% complete with all core components implemented and tested.

Completed Features

Core Infrastructure

  • Universal Tokenizer: Boolean flag token design with fast bitmask operations
  • Lexer Core Engine: Streaming lexer with error recovery and lookahead
  • DFA/NFA Pattern Matching: Regex-to-DFA compiler for YAML patterns
  • Unified AST Format: BSON-serializable nodes with source positions
  • Grammar Definition System: YAML loader with EBNF/BNF support
    • YAML Grammar Definition: Complete YAML grammar definition (grammar/yaml.yaml) for parsing YAML with Kotodama itself (meta-circular evaluation)
  • AST Transformation Engine: Pattern matching and rewrite rules for AST manipulation
    • Pattern Matching: Type-based, data field, and child structure matching
    • Rewrite Rules: Priority-based rule application with capture groups
    • Visitor Integration: Works with existing visitor pattern for custom traversals

Execution Engines

  • Tree-Walk Interpreter: Expression evaluator with native function binding
  • Bytecode VM: Stack-based bytecode with debug information
  • Go Code Generator: Generate lexer/parser from YAML definitions

Practical Tools

  • JSON↔YAML Transpiler: Complete JSON parser using Kotodama's lexer/AST
  • Enhanced CLI Tool: Subcommand architecture with:
    • repl: Interactive REPL
    • run <file>: Execute Kotodama scripts (with shebang support)
    • ast <file>: Output AST in Graphviz DOT format
    • json2yaml / yaml2json: Data format conversion
    • lsp: Start Language Server Protocol server
    • version: Version information

LSP Server Framework

  • Language-agnostic LSP core: Works with any Kotodama language definition
  • Text document synchronization: Full incremental sync support
  • Diagnostics: Real-time error reporting using lexer/parser errors
  • Extensible architecture: Easy to add language-specific features
  • JSON support: Full diagnostics for JSON files

🔄 In Progress / Next Steps

  • Complete Go code generator (AST type generation, visitor pattern)
  • YAML grammar definition (parsing YAML with Kotodama itself) ✓
  • AST transformation engine with pattern matching ✓
  • LSP server framework ✓
  • Common language definitions (TOML, INI, CSV)

🏗️ Architecture

Core Components

kotodama/
├── token/          # Universal tokenizer (boolean flags)
├── lexer/         # Lexer generator (YAML → DFA)
├── parser/        # Parser generator (YAML → parser)
├── ast/           # Unified AST (BSON serializable)
├── transform/     # AST pattern matching and transformation
├── interpreter/   # Tree-walk interpreter
├── bytecode/      # IR/bytecode format
├── generator/     # Code generator (YAML → Go)
├── transpile/     # Source-to-source translation
├── grammar/       # Grammar definition loader
├── lsp/           # Language Server Protocol server
└── cmd/kotodama/  # CLI tool

AST Transformation Engine

The transform package provides powerful pattern matching and rewrite capabilities for AST manipulation:

import "git.lan.thwap.org/thwap/kotodama/transform"

factory := &ast.NodeFactory{}
transformer := transform.NewTransformer(factory)

// Add a rule to convert string values to uppercase
transformer.AddRule(&transform.Rule{
    Name: "uppercase_strings",
    Pattern: transform.PatternOfType("String"),
    Transform: func(match *transform.MatchResult) *ast.Node {
        val, _ := match.Node.Data["value"].(string)
        upper := strings.ToUpper(val)
        return factory.CreateNode("String", map[string]interface{}{
            "value": upper,
        }, nil)
    },
    Priority: 10,
})

// Apply transformations (bottom-up by default)
transformedAST := transformer.Transform(originalAST)

Features:

  • Pattern Matching: Match nodes by type, data fields, and child structure
  • Capture Groups: Extract matched nodes for use in transformations
  • Rule Priorities: Control application order with priority levels
  • Multiple Traversal Strategies: Bottom-up, top-down, and visitor-based
  • Wildcard Support: Regex patterns for node types and values

Three Operational Modes

  1. Framework Mode: Library for embedding in Go applications
  2. Code Generator Mode: YACC 2.0 generates code in target languages
  3. Transpiler Mode: Language-to-language translation

Key Innovations

Boolean Flag Token Design

type Token struct {
    Pos     Position  // 8 bytes: offset, line, column
    Span    int16     // Token length in runes
    Literal []byte    // Token text (slice into source)
    Type    TokenType // 2 bytes: category + subtype
}

// Fast boolean checks via bitmask operations
func (t Token) IsKeyword() bool { return t.Type & CategoryMask == TkKeyword }
func (t Token) IsIdentifier() bool { return t.Type & CategoryMask == TkIdentifier }

Unified AST Format (BSON Serializable)

type Node struct {
    ID       string                 `bson:"id"`      // UUID for referencing
    Type     NodeType               `bson:"type"`    // Node type
    Span     Span                   `bson:"span"`    // Source location
    Data     map[string]interface{} `bson:"data"`    // Node-specific data
    Children []*Node                `bson:"children,omitempty"`
    Metadata map[string]interface{} `bson:"meta,omitempty"`
}

🚀 Getting Started

Prerequisites

  • Go 1.21+ (currently using 1.24.11)
  • Graphviz (optional, for AST visualization)

Installation

# Clone the repository
git clone https://git.lan.thwap.org/thwap/kotodama.git
cd kotodama

# Build the CLI tool
make build
# or
go build ./cmd/kotodama

# Run tests
make test

Quick Examples

Interactive REPL

./kotodama repl
> 1 + 2 * 3
7
> print("Hello, " + "world!")
Hello, world!

Execute a Script

#!/usr/bin/env kotodama run
# test.ktdm
print("1 + 2 * 3 =", 1 + 2 * 3)
print("Length of 'hello':", len("hello"))
./kotodama run test.ktdm
# Output:
# 1 + 2 * 3 = 7
# Length of 'hello': 5

JSON ↔ YAML Transpilation

# JSON to YAML
echo '{"name": "John", "age": 30}' | ./kotodama json2yaml
# Output:
# age: 30
# name: John

# YAML to JSON
echo -e 'name: John\nage: 30' | ./kotodama yaml2json
# Output:
# {
#   "age": 30,
#   "name": "John"
# }

AST Visualization

# Create a simple script
echo 'print(1 + 2 * 3)' > example.ktdm

# Generate Graphviz DOT output
./kotodama ast example.ktdm > ast.dot

# Convert to image (requires Graphviz)
dot -Tpng ast.dot -o ast.png

📚 Language Definitions

JSON Grammar

Complete JSON parser implementation in grammar/json.yaml:

  • Objects, arrays, strings, numbers, booleans, null
  • Uses Kotodama's lexer and AST
  • Integrated into JSON↔YAML transpiler

Creating New Languages

Define languages in YAML format:

# example.yaml
name: "Example Language"
version: "1.0"

lexer:
  tokens:
    identifier: "[a-zA-Z_][a-zA-Z0-9_]*"
    number: "[0-9]+"
  
  keywords:
    func: TkFunc
    var: TkVar
  
  operators:
    "+": TkAdd
    "-": TkSub

grammar:
  start: "Program"
  rules:
    Program: "Statement*"
    Statement: "Expression ;"
    Expression: "Term (('+' | '-') Term)*"

🛠️ API Usage

Embedding in Go Applications

package main

import (
    "fmt"
    "git.lan.thwap.org/thwap/kotodama/ast"
    "git.lan.thwap.org/thwap/kotodama/interpreter"
    "git.lan.thwap.org/thwap/kotodama/lexer"
    "git.lan.thwap.org/thwap/kotodama/parser"
)

func main() {
    src := []byte("1 + 2 * 3")
    l := lexer.NewFromBytes(src)
    factory := &ast.NodeFactory{}
    p := parser.NewPrattParser(l, &parser.Config{Mode: parser.ModePratt}, factory)
    node, err := p.Parse()
    if err != nil {
        panic(err)
    }
    
    env := interpreter.NewEnvironment()
    eval := interpreter.NewEvaluator(env)
    result, err := eval.Evaluate(node)
    if err != nil {
        panic(err)
    }
    fmt.Println(result) // Output: 7
}

Using the Code Generator

package main

import (
    "fmt"
    "git.lan.thwap.org/thwap/kotodama/generator"
    "git.lan.thwap.org/thwap/kotodama/grammar"
    "os"
)

func main() {
    // Load grammar from YAML
    data, _ := os.ReadFile("grammar/json.yaml")
    g, _ := grammar.LoadFromYAML(data)
    
    // Generate Go parser code
    code, err := generator.GenerateRecursiveDescent(g, generator.Config{
        PackageName: "jsonparser",
        Visitor:     true,
    })
    if err != nil {
        panic(err)
    }
    
    fmt.Println(code)
}

🧪 Testing

# Run all tests
make test

# Run specific test suites
make test-lexer
make test-parser
make test-generator
make test-interpreter

# Run with verbose output
make test-verbose

# Generate coverage report
make coverage
# Open coverage.html in browser

📊 Performance

Tokenizer Targets

  • < 100ns per token on average
  • Zero allocations for common token paths
  • < 50MB memory for 1MB source file
  • Support for files > 1GB via streaming

Current Benchmarks

Run benchmarks with:

make bench

🗺️ Roadmap

See ROADMAP.md for detailed planning.

Immediate Goals (Next 1-2 weeks)

  1. Complete Go code generator with AST type generation
  2. YAML grammar definition (parsing YAML with Kotodama)
  3. AST transformation engine with pattern matching

Phase 2: Tooling Ecosystem (Months 4-6)

  • LSP server framework
  • Code formatter
  • Debugger interface
  • Plugin system

Phase 3: Language Expansion (Months 7-12)

  • Shell implementations (bash, fish, PowerShell)
  • Programming language subsets (Go, Python, JavaScript)
  • SQL dialects (PostgreSQL, MySQL, SQLite)

👥 Contributing

Kotodama is a 100% free and open source project.

For Developers

  • Go 1.21+ required
  • Test-driven development required
  • Performance benchmarks for critical paths
  • Documentation for public APIs

For Language Designers

  • Create YAML language definitions
  • Write transpilation rules
  • Add examples and documentation

For Users

  • Bug reports with minimal reproduction
  • Feature requests with use cases
  • Documentation improvements

📄 License

Apache 2.0 for all code Creative Commons for documentation

📞 Contact & Resources

  • Source Code: git.lan.thwap.org/thwap/kotodama
  • Documentation: In this README and code comments

Kotodama Simplify language tooling to the point where creating a new language takes hours, not months.