Evaluators
Evaluators are the scoring functions that determine whether an evolved solution is better than the original. They measure performance, verify correctness, and rank candidates. Without a good evaluator, Evolve can’t distinguish a genuine improvement from a regression.How Evaluators Work
During each iteration of an evolution:- The LLM generates a new candidate program
- The evaluator runs against the candidate
- A fitness score is produced
- The score determines whether the candidate survives to the next generation
Evaluator Strategies
Kai supports three approaches to evaluation.Auto-Generated
Kai analyzes your code scopes and automatically generates evaluators that:- Measure execution time and throughput
- Verify correctness against expected outputs
- Compare results with the original implementation
- Score candidates across multiple dimensions
Existing
If you already have benchmark scripts, test harnesses, or evaluation functions in your repository, you can point Evolve to them directly. This is useful when:- You have established performance benchmarks
- Your evaluation criteria are complex or domain-specific
- You want full control over how solutions are scored
LLM Only
For exploratory optimizations where formal benchmarking is difficult, Kai can use LLM-based scoring. The LLM evaluates the evolved code based on code quality, algorithmic complexity analysis, and pattern recognition against known optimizations. This strategy is less precise than automated benchmarking but useful for early exploration or when the cost of writing a formal evaluator outweighs the optimization target.Evaluator Generation
When you select the auto-generated strategy, Kai:- Analyzes the code scopes you’ve selected
- Identifies inputs, outputs, and expected behavior
- Generates a scoring function that captures performance characteristics
- Validates the evaluator against your existing code before starting the evolution
Custom Scopes
Evaluators are tied to specific code scopes: the file and line ranges that Evolve will optimize. When configuring scopes, you define exactly which parts of your code the evaluator should measure. Narrower scopes produce more focused optimizations. Broader scopes give Evolve more room to discover structural improvements but require evaluators that can measure the impact across a larger surface area.Next Steps
- Getting Started - Walk through the full evolution setup flow
- Understanding Results - Interpret fitness scores and evolution metrics