> ## Documentation Index
> Fetch the complete documentation index at: https://docs.coval.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Choose a metric

> Find the right metric by what you want to measure.

The [Metric Library](/concepts/metrics/overview#the-metric-library) is organized by *how* each metric works. This page flips that around: start from **what you want to measure** and jump straight to the metrics that measure it.

<Tip>
  **Just starting out?** A solid baseline for most agents is [Latency](/concepts/metrics/types/statistical#latency), [End Reason](/concepts/metrics/types/deterministic#end-reason), and one custom [Binary LLM Judge](/concepts/metrics/types/llm-judge#binary-llm-judge) for your specific success criteria. Add more as you learn where your agent struggles.
</Tip>

## Did the agent do its job? (task resolution & correctness)

| Metric                                                                               | What it tells you                                           |
| ------------------------------------------------------------------------------------ | ----------------------------------------------------------- |
| [Binary LLM Judge](/concepts/metrics/types/llm-judge#binary-llm-judge)               | A yes/no answer to your exact success question              |
| [Composite Evaluation](/concepts/metrics/types/llm-judge#composite-evaluation)       | Several pass/fail criteria scored together                  |
| [End Reason](/concepts/metrics/types/deterministic#end-reason)                       | How the conversation ended (resolved, transferred, dropped) |
| [Match Expected Output](/concepts/metrics/types/deterministic#match-expected-output) | Whether the outcome matches a known-correct value           |

## Was it fast and responsive? (latency & reliability)

| Metric                                                                                     | What it tells you                         |
| ------------------------------------------------------------------------------------------ | ----------------------------------------- |
| [Latency](/concepts/metrics/types/statistical#latency)                                     | How long the agent takes to respond       |
| [Time to First Audio](/concepts/metrics/types/statistical#time-to-first-audio)             | Delay before the agent starts speaking    |
| [Agent Fails to Respond](/concepts/metrics/types/deterministic#agent-fails-to-respond)     | Turns where the agent went silent         |
| [Agent Needs Reprompting](/concepts/metrics/types/deterministic#agent-needs-reprompting)   | Times the user had to repeat themselves   |
| [Agent Repeats Itself](/concepts/metrics/types/statistical#agent-repeats-itself)           | Looping or repeated agent responses       |
| [LLM / STT / TTS Time to First Byte](/concepts/metrics/types/trace#llm-time-to-first-byte) | Where latency comes from in your pipeline |

## Does it sound natural? (voice quality — voice agents)

| Metric                                                                                     | What it tells you                              |
| ------------------------------------------------------------------------------------------ | ---------------------------------------------- |
| [Interruption Rate](/concepts/metrics/types/statistical#interruption-rate)                 | How often the agent talks over the user        |
| [Speech Tempo](/concepts/metrics/types/statistical#speech-tempo)                           | Whether the agent speaks too fast or too slow  |
| [Pitch Variability](/concepts/metrics/types/statistical#pitch-variability)                 | Monotone vs. natural intonation                |
| [Volume-Pitch Misalignment](/concepts/metrics/types/statistical#volume-pitch-misalignment) | Unnatural volume/pitch mismatches              |
| [Voice Quality](/concepts/metrics/types/statistical#voice-quality)                         | Overall acoustic quality of the agent's speech |

See the [Statistical metrics](/concepts/metrics/types/statistical) page for the full set of acoustic and prosody checks (background noise, artifacts, vocal fry, and more).

## How did the customer feel? (sentiment & experience)

| Metric                                                                         | What it tells you                                        |
| ------------------------------------------------------------------------------ | -------------------------------------------------------- |
| [Transcript Sentiment](/concepts/metrics/types/ml-model#transcript-sentiment)  | Sentiment inferred from the text                         |
| [Audio Sentiment](/concepts/metrics/types/ml-model#audio-sentiment)            | Sentiment inferred from the audio (tone, not just words) |
| [Composite Evaluation](/concepts/metrics/types/llm-judge#composite-evaluation) | A blended experience score across criteria you define    |

## Did it follow the rules? (compliance & scripting)

| Metric                                                                                                                                          | What it tells you                                                              |
| ----------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------ |
| [Transcript Regex Match](/concepts/metrics/types/deterministic#transcript-regex-match)                                                          | Whether required phrases appear — or use absent-match to enforce "never say X" |
| [Binary](/concepts/metrics/types/llm-judge#binary-llm-judge) / [Categorical LLM Judge](/concepts/metrics/types/llm-judge#categorical-llm-judge) | Policy, tone, and disclosure checks                                            |

If your agent follows a defined call flow, add **Workflow Verification** to catch off-path behavior — generate the workflow in the [Agent creation flow](/concepts/agents/workflow) and the metric re-traces it against the transcript.

## Did it do the right things behind the scenes? (tools & traces)

| Metric                                                                              | What it tells you                                                |
| ----------------------------------------------------------------------------------- | ---------------------------------------------------------------- |
| [Tool Call Count](/concepts/metrics/types/trace#tool-call-count)                    | Whether the agent called the tools it should have                |
| [Custom Trace](/concepts/metrics/types/trace#custom-trace)                          | Any value extracted from your OpenTelemetry spans                |
| [API State](/concepts/metrics/types/deterministic#api-state)                        | State your backend reports for the call                          |
| LLM Judge with [Trace context](/concepts/metrics/configuring-metrics#trace-context) | Verify tool usage and order that isn't visible in the transcript |

## Was the audio transcribed accurately? (STT accuracy)

| Metric                                                                      | What it tells you                                 |
| --------------------------------------------------------------------------- | ------------------------------------------------- |
| [STT Word Error Rate](/concepts/metrics/types/trace#stt-word-error-rate)    | How accurately speech was transcribed             |
| [Transcription Error](/concepts/metrics/types/ml-model#transcription-error) | Likely transcription mistakes in the conversation |

## What did it cost? (usage)

| Metric                                                           | What it tells you                |
| ---------------------------------------------------------------- | -------------------------------- |
| [LLM Token Usage](/concepts/metrics/types/trace#llm-token-usage) | Tokens consumed per conversation |

***

Once you know which metrics you want, [add them to a run](/concepts/metrics/quickstart). For anything custom, see [Write judge prompts](/concepts/metrics/writing-judge-prompts) and [Configure metrics](/concepts/metrics/configuring-metrics).