Changelog - Coval Documentation

Jun 15

2026

7 updates

Highlights

Metric versioning with full change history
Standardized LLM-judge metric templates
AI-generated run summaries (beta)
Project-scoped Human Review pages

Metric versioning

Metrics now keep a version history, so you can see how a metric changed over time and review prior versions.

Each metric keeps a record of its prior states, newest first. You can review the history in the app or pull it through the v1 API.View in docs ↗

Standardized metric templates

A library of ready-made LLM judge metric templates is now available when you create a metric, and the metric gallery is reorganized into clearer categories.

Pick a standardized judge prompt as a starting point instead of writing one from scratch.

Run summaries (beta)

Runs now include an AI-generated summary of what happened, available in beta. Mark a summary helpful or not to help it improve.

The summary renders with clean formatting and a thumbs up or down control. As a beta feature, it should improve over time as we tune it.

Human Review project pages

Human Review now has project-scoped pages, so you can open a project’s overview and assignments on their own pages and share a direct link to them.

Each project gets its own overview and assignments view with breadcrumbs, so you can navigate to and share specific review work without losing context.View in docs ↗

Connection validation for Pipecat, LiveKit, and WebSocket agents

Coval now validates Pipecat, LiveKit, and WebSocket agent connections when you set them up, so configuration problems surface before you run.

Bring your own background sounds

You can now upload your own background sounds for simulations, so agents can be tested against the exact ambient conditions they will face in production.

Telephony recording uploads

Telephony call recordings in MP3 format now upload reliably, including low-sample-rate recordings that were previously rejected.

Jun 6

2026

2 updates

Highlights

New pitch variability metric
New perceived loudness (LUFS) metric

Pitch variability metric

A new metric that flags whether an agent sounds monotone or expressive across a call.

Perceived loudness (LUFS)

A new metric that measures perceived loudness across a call, so you can catch audio that is too quiet or too loud.

Jun 1

2026

5 updates

Highlights

Create and delete dashboards via the API and CLI
Broader chat-agent connectivity with SSE streaming
Per-conversation metadata in metric prompts

Dashboards via API and CLI

Create and delete dashboards programmatically through the API and CLI.

Manage dashboards as part of your own workflows and scripts, without setting each one up by hand in the app. Useful for spinning up consistent dashboards per project or per environment.View in docs ↗

Broader chat-agent connectivity

Chat agents now support SSE streaming and a configurable response format.

Metadata in metric prompts

Dynamic metrics can now reference per-conversation metadata directly in the prompt template.

Clearer concurrency limits

When your organization is running evaluations beyond your concurrency limits, we will store your data but flag the evaluations as an error. This lets you rerun them later, once you are operating within your concurrency limits.

Smoother metric authoring

Metric descriptions auto-populate and the name auto-fills when you create a metric.

May 26

2026

1 update

Highlights

More reliable categorical audio metrics

More reliable categorical audio metrics

Categorical audio metrics now flag a clear, descriptive error when no categories are configured, so misconfigurations surface right away.

May 19

2026

7 updates

Highlights

Visual IVR Tree Builder
IVR Flow Adherence metric
Custom trace aggregations
Tags for metrics, templates, and test sets

IVR Flow Adherence metric

A built-in metric that checks whether a call follows the intended IVR navigation path, with no custom scoring logic required.

The metric scores each call against the IVR path you define, so you can see where calls deviate from the intended route. Results are reported per call and roll up across the run.View in docs ↗

IVR Tree Builder

Define and simulate branching IVR call flows visually, directly in Coval.

Lay out the call paths a caller can take, then run simulations against the whole tree to see where agents take the wrong branch. No external diagramming or scripting needed.View in docs ↗

Custom trace aggregations

Choose how per-turn scores roll up across a multi-turn trace, for example worst-case or first-occurrence.