No description
  • Go 71.1%
  • Shell 23%
  • Makefile 5.9%
Find a file
2025-06-14 22:40:54 +02:00
cmd/analyze-batch Analyze multi batch 2025-06-05 18:05:45 +02:00
config adds avg response time, and adds more expected_tools_variants 2025-06-05 10:29:34 +02:00
models Adds total and average time per req 2025-06-03 11:32:01 +02:00
services Log requests and responses 2025-06-04 11:09:36 +02:00
tools Simplify 2025-06-14 22:40:54 +02:00
.gitignore clean up 2025-06-04 17:00:12 +02:00
ANALYSIS.md clean up 2025-06-04 17:00:12 +02:00
go.mod Initial commit 2025-06-02 17:31:13 +02:00
go.sum Initial commit 2025-06-02 17:31:13 +02:00
main.go Simplify 2025-06-14 22:40:54 +02:00
Makefile Simplify 2025-06-14 22:40:54 +02:00
README.md Simplify 2025-06-14 22:40:54 +02:00
test-all-models.sh Run tests by provider 2025-06-05 17:35:26 +02:00

Agent Loop Tool Efficiency Test

A Go application for testing AI models with function calling using an agent loop architecture. Tests tool calling efficiency, cart management scenarios, and provides detailed performance metrics.

Quick Start

# Clone and setup
git clone https://github.com/ilopezluna/model-test
cd model-test

# Run with default model
make run

# Run with specific model
make run MODEL="ai/llama3.2"

# Run single test case
make run TEST_CASE="simple_view_cart" MODEL="ai/gemma3"

Command Line Usage

Basic Usage

# Run all test cases with default model (gpt-4o-mini)
./model-test

# Run with specific model
./model-test --model "ai/qwen2.5"

# Run single test case
./model-test --test-case "simple_view_cart"

# Custom API settings
./model-test --model "gpt-4" --base-url "https://api.openai.com/v1" --api-key "your-key"

Command Line Flags

  -api-key string
        OpenAI API key (or set OPENAI_API_KEY env var) (default "DMR")
  -base-url string
        OpenAI API base URL (or set OPENAI_BASE_URL env var) (default "http://localhost:13434")
  -config string
        Path to test cases configuration file (default "config/test_cases.json")
  -model string
        Model to use (or set OPENAI_MODEL env var, defaults to gpt-4o-mini)
  -test-case string
        Run only the specified test case by name

Environment Variables

export OPENAI_API_KEY="your-api-key"
export OPENAI_BASE_URL="https://api.openai.com/v1"
export OPENAI_MODEL="gpt-4"

Make Commands

Basic Commands

# Run commands
make run                                    # Run with default values
make run MODEL="gpt-4"                     # Run with specific model
make run TEST_CASE="simple_view_cart"      # Run specific test case
make run MODEL="gpt-4" TEST_CASE="cart"    # Run with multiple parameters

# Test commands
make test                                  # Test all models
make test MODELS="gpt-4,claude-3"          # Test specific models
make test TEST_CASE="simple_view_cart"     # Test specific case
make test MODELS="gpt-4" TEST_CASE="cart"  # Test specific model and case

# Utility commands
make list-tests                            # List available test cases
make help                                  # Show all available commands

Development Commands

make build          # Build the application
make clean          # Clean build artifacts and results

Test Cases

The application includes 18 test cases covering:

  • Zero Tool Cases: Greetings, general questions (no tools expected)
  • Simple Cases: Single tool operations (search, add, view, remove, checkout)
  • Medium Cases: Two-step operations (search then add, remove then add)
  • Complex Cases: Multi-step workflows with cart management

Example Test Cases

  • zero_greeting - Simple greeting (no tools)
  • simple_search_electronics - Search for electronics
  • simple_add_iphone - Add iPhone to cart
  • medium_search_and_add - Search and add to cart
  • complex_cart_management - Multi-step cart organization (with initial cart state)

Output and Results

Result Files

Results are saved to results/ directory with format:

agent_test_results_<model>_<timestamp>.json

Examples:

  • agent_test_results_gpt-4_20250603_112616.json
  • agent_test_results_ai_llama3.2_20250603_112623.json
  • agent_test_results_gpt-4o-mini_20250603_112630.json

Performance Metrics

📈 Agent Test Results
==================================================
Total Tests: 18
✅ Passed: 15
❌ Failed: 3
⏱️  Total LLM Time: 12.4s
⏱️  Average Time per Request: 1.2s
📊 Overall Success Rate: 83.33%

Key Metrics

  • Total LLM Time: Time spent in actual LLM requests (excludes framework overhead)
  • Average Time per Request: Per individual LLM API call (not per test)
  • Tool Call Accuracy: Matches expected tool calling patterns
  • Success Rate: Percentage of tests that matched expected behavior

Configuration

Test Case Structure

{
  "name": "complex_cart_management",
  "prompt": "Help me organize my shopping cart...",
  "initial_cart_state": {
    "items": [
      {
        "product_name": "iPhone",
        "quantity": 2
      },
      {
        "product_name": "Wireless Headphones",
        "quantity": 1
      }
    ]
  },
  "expected_tools_variants": [
    
  ]
}

Available Tools

  • search_products - Search by query, category, or both
  • add_to_cart - Add products with quantity
  • remove_from_cart - Remove products from cart
  • view_cart - View cart contents and totals
  • checkout - Process checkout

Requirements

  • Go: 1.19+
  • Local AI Server: Docker Model Runner or Ollama
  • OR OpenAI API: With valid API key

Adding New Test Cases

  1. Add test case to config/test_cases.json
  2. Define expected tool call variants
  3. Optionally specify initial cart state
  4. Run with make run TEST_CASE="your_test_name"

Model Comparison

# Test multiple models
make test MODELS="gpt-4,gpt-4o-mini,ai/llama3.2"

# Or test them individually
make run MODEL="gpt-4"
make run MODEL="gpt-4o-mini"
make run MODEL="ai/llama3.2"