Skip to main content
A metric only does something once it’s attached to a run. There are two ways to get started: use one of Coval’s built-in metrics, or create your own.

Use a built-in metric

Every run lets you pick which metrics to track — you don’t have to create anything first.
1

Launch an evaluation

From the Launch Simulations page, start a run as usual (agent, persona, test set). See Simulations.
2

Choose metrics to track

In the launch panel, pick one or more built-in metrics — for example Latency, End Reason, or Transcript Sentiment. Not sure which? See Choose a metric.
3

Read the results

When the run finishes, open it from Runs. Each simulation shows a metric card with its score; click into a simulation to see the score explained against the transcript.

Create your own metric

When the built-ins don’t capture your exact success criteria, author a custom metric. The most common starting point is a Binary LLM Judge — a yes/no question about the transcript.
1

Open Metrics and add a new one

Go to Metrics in the sidebar and click New Metric.
2

Pick a metric type

Choose the type that fits what you’re measuring. For a yes/no check, pick Binary LLM Judge. Browse all types in the Metric Library.
3

Configure it

Give the metric a name and supply what that type needs — a prompt for an LLM judge, a pattern for a regex check, or a threshold for a numeric metric. For prompts, follow Write judge prompts.
Given the transcript, did the assistant confirm the appointment date and time
before ending the call?

Return YES if the assistant restated both the date and the time and the user
acknowledged them. Return NO otherwise.
4

Test and refine

Open the metric and click Improve Metric to run it against real transcripts and check how often it returns YES vs. NO. Tighten the wording until results are consistent. The strongest signal here is human review.
5

Attach it to a run

Save the metric, then select it when you launch your next evaluation — exactly like a built-in. Its scores appear alongside the others in the run results.

Where to go next

Choose a metric

Find the right metric for what you want to measure.

Write judge prompts

Make your LLM-judge metrics reliable.

Configure metrics

Template variables, transcript scope, trace context, thresholds.

Metric Library

Every metric type, in depth.