Evaluators

Evaluators are the scoring functions that determine whether an evolved solution is better than the original. They measure performance, verify correctness, and rank candidates. Without a good evaluator, Evolve can’t distinguish a genuine improvement from a regression.

How Evaluators Work

During each iteration of an evolution:

The LLM generates a new candidate program
The evaluator runs against the candidate
A fitness score is produced
The score determines whether the candidate survives to the next generation

This feedback loop is what drives the evolutionary process. The evaluator provides the objective, quantifiable assessment that lets Evolve improve systematically rather than randomly. Evaluators run in Kai’s cloud infrastructure alongside the evolution, so they don’t require any local setup.

Evaluator Strategies

Kai supports three approaches to evaluation.

Auto-Generated

Kai analyzes your code scopes and automatically generates evaluators that:

Measure execution time and throughput
Verify correctness against expected outputs
Compare results with the original implementation
Score candidates across multiple dimensions

This is the default strategy and works well for most optimization tasks. Kai uses LLM-based analysis to understand what your code does and creates an evaluator that captures the right performance characteristics.

Existing

If you already have benchmark scripts, test harnesses, or evaluation functions in your repository, you can point Evolve to them directly. This is useful when:

You have established performance benchmarks
Your evaluation criteria are complex or domain-specific
You want full control over how solutions are scored

In the manual start flow, specify the file path to your evaluator. In the guided flow, Kai detects existing evaluators and suggests them where appropriate.

LLM Only

For exploratory optimizations where formal benchmarking is difficult, Kai can use LLM-based scoring. The LLM evaluates the evolved code based on code quality, algorithmic complexity analysis, and pattern recognition against known optimizations. This strategy is less precise than automated benchmarking but useful for early exploration or when the cost of writing a formal evaluator outweighs the optimization target.

Evaluator Generation

When you select the auto-generated strategy, Kai:

Analyzes the code scopes you’ve selected
Identifies inputs, outputs, and expected behavior
Generates a scoring function that captures performance characteristics
Validates the evaluator against your existing code before starting the evolution

You can track evaluator generation progress during the evaluator review step of the setup flow.

Custom Scopes

Evaluators are tied to specific code scopes: the file and line ranges that Evolve will optimize. When configuring scopes, you define exactly which parts of your code the evaluator should measure. Narrower scopes produce more focused optimizations. Broader scopes give Evolve more room to discover structural improvements but require evaluators that can measure the impact across a larger surface area.

Next Steps

Getting Started - Walk through the full evolution setup flow
Understanding Results - Interpret fitness scores and evolution metrics

​Evaluators

​How Evaluators Work

​Evaluator Strategies

​Auto-Generated

​Existing

​LLM Only

​Evaluator Generation

​Custom Scopes

​Next Steps