Evaluation And Stopping

Evaluation, feedback, and stopping are separate responsibilities.

Evaluation

ops.Evaluate records facts. Evaluators produce scores, labels, reasons, or other structured facts about candidate trajectories and target responses.

Use provider-enforced structured output for LLM-backed primitives that need machine-readable output. Free-text LLM calls should stay limited to natural-language outputs.

Common response-shape evaluators include:

evaluators.Contains for exact substring checks.
evaluators.StartsWith for affirmative-start or prefix-shaped responses.
evaluators.NotContainsAny for configurable blocked-phrase absence, such as weak refusal-shape checks.
evaluators.JudgePanel for aggregating multiple response evaluators into one panel result.

JudgePanel is still an evaluator. It writes evidence that later operators can consume; it does not decide runtime termination by itself.

Stopping

ops.StopWhen consumes facts. Conditions such as conditions.ScoreAtLeast decide when the runtime should stop based on recorded evidence.

Feedback

Feedback is not evaluation. ops.AddFeedback converts observations and evaluations into future attacker context or learning state.

Keeping these roles separate makes paper workflows easier to reproduce and inspect.

Evaluation And Stopping

Evaluation

Stopping

Feedback

On this page