Use a built-in metric
Every run lets you pick which metrics to track — you don’t have to create anything first.Launch an evaluation
From the Launch Simulations page, start a run as usual (agent, persona, test set). See Simulations.
Choose metrics to track
In the launch panel, pick one or more built-in metrics — for example Latency, End Reason, or Transcript Sentiment. Not sure which? See Choose a metric.
Read the results
When the run finishes, open it from Runs. Each simulation shows a metric card with its score; click into a simulation to see the score explained against the transcript.
Create your own metric
When the built-ins don’t capture your exact success criteria, author a custom metric. The most common starting point is a Binary LLM Judge — a yes/no question about the transcript.Pick a metric type
Choose the type that fits what you’re measuring. For a yes/no check, pick Binary LLM Judge. Browse all types in the Metric Library.
Configure it
Give the metric a name and supply what that type needs — a prompt for an LLM judge, a pattern for a regex check, or a threshold for a numeric metric. For prompts, follow Write judge prompts.
Test and refine
Open the metric and click Improve Metric to run it against real transcripts and check how often it returns YES vs. NO. Tighten the wording until results are consistent. The strongest signal here is human review.
Where to go next
Choose a metric
Find the right metric for what you want to measure.
Write judge prompts
Make your LLM-judge metrics reliable.
Configure metrics
Template variables, transcript scope, trace context, thresholds.
Metric Library
Every metric type, in depth.