- Set up an IVR Flow Adherence metric — describe your menu.
- Generate the test set — one click turns the menu into test cases.
- Run the simulation — test the agent and read the results.
1. Set up an IVR Flow Adherence metric
Create a new metric and choose IVR Flow Adherence. The metric holds a description of your phone menu — the IVR flow — as a tree of:- Nodes — each one is a prompt the agent is expected to say (e.g. “Press 1 for billing, 2 for support”). Mark the end-of-call nodes as terminal.
- Edges — what the caller does to move from one node to the next: a keypad digit (
1), a pattern (\d{4}for a 4-digit PIN), or a spoken keyword (operator). Each edge has a match type —literal,regex, orfuzzy.

- Workflow — a visual drag-and-drop editor. Add nodes, set the start node, mark terminal nodes, and draw edges between them.
-
Generate — describe your IVR in plain language and let Coval draft the tree for you. For example:
A bank IVR that greets the caller, asks them to press 1 for account access or 2 for a new account, and falls back to an operator if the caller says “operator”.
Type create example tree to drop in a ready-made sample to learn from. - JSON — paste or edit the flow directly.
IVR flow JSON shape
IVR flow JSON shape
0.85) controls how closely the agent’s spoken prompt has to match a node’s expected prompt to count as a match — raise it to be stricter about exact wording, lower it to allow more paraphrasing.
Save the metric when your flow is complete.
2. Generate the test set
On the metric, click Generate test set. The button shows how many paths Coval found — for example, Generate test set (6 paths). Coval enumerates every route from the start node to each terminal node and creates one test case per path. Each is a Script test case that replays the caller’s inputs along that path:- Keypad inputs become DTMF turns (
1,2, a generated PIN for a\d{4}edge). - Spoken keywords (fuzzy edges like
operator) stay as text turns. - The exact nodes the path should visit are saved as the test case’s expected output, so the metric knows the correct route.
DTMF or speech. You control whether the generated caller inputs are dialed or spoken through how you define each edge: keypad digits (
1, *, #) generate DTMF turns the caller presses, while keyword edges generate spoken text. Use digit edges to test keypad navigation, keyword edges to test voice navigation, or mix both.IVR Test Set – <your metric name> and opens it. Review the generated cases and tweak any wording before you run.
3. Run the simulation
Launch a run with:- your voice agent
- the generated IVR test set
- the IVR Flow Adherence metric
- a voice persona — the script controls the keypresses and replies, so any neutral voice works
Read the results
Open a simulation and look at the IVR Flow Adherence metric. The score is the fraction of the nodes the caller walked whose prompt matched the flow — so1.0 (100%) means every expected prompt was matched and every keypress routed as designed.
Adherence is pass/fail, not a grade. Only a perfect 1.0 counts as a pass (green) — any divergence, even a single mismatched prompt or a keypress that routed to the wrong place, fails (red). Treat anything below 100% as a failure to investigate, not a partial success. The deep-dive view adds descriptive bands for context — Perfect adherence (100%), Mostly adhered (75%+), Partial adherence (above 0%), and Did not adhere (0%) — but those are just severity coloring, not passing scores.
The Walk shows you what happened step by step: each node as Passed / Failed / Not visited, the first divergence (expected vs. actual prompt side by side), and each keypress with the branch it matched and where it led. Use it to decide where the fix belongs:
- Prompt wording — the agent said roughly the right thing but scored below the match threshold. Tighten the agent’s prompt, or adjust the threshold if your wording is intentionally flexible.
- Routing — a keypress led somewhere unexpected. Fix the agent’s menu handling or DTMF routing.
- Coverage — add nodes or edges to the flow and regenerate the test set to exercise more of the menu.