Reports - Coval Documentation

A single run shows how your agent performed in one configuration. A Report combines multiple runs into one view so you can compare them — grouped and color-coded by a dimension you choose, with aggregated statistics and a shareable link.

Creating a report

Reports are built from the Runs page, in one of two ways:

From selected runs — turn on select mode, check the runs you want to compare (at least two), and click Create Report From Selection in the action bar.
From the current filters — with filters applied to the runs list, click Multi-run report to build a report from every run matching those filters.

A report includes up to 50 runs; if more match, only the first 50 are used. You can also create a report from a scheduled run: open the scheduled run and click Create Report, then choose a timeframe (past 24 hours, week, 30 days, or a custom range) to pull its runs into a report.

Designing the comparison

A report is only as useful as the runs in it. Change one variable at a time and keep the others constant. To compare two agents, run both against the same personas and the same test sets. Any difference in scores can then be attributed to the agent rather than to the personas or test cases. A test matrix is a reliable way to set this up: choose the variable you’re testing, then run every combination of the other dimensions across both.

	Persona: Calm	Persona: Impatient
Agent A	run	run
Agent B	run	run

From a matrix like this, comparing by Agent isolates the agent and comparing by Persona isolates the persona, using the same set of runs.

Compare By

The Compare by dropdown groups the simulations by a dimension and lines each metric up across the groups.

Compare by	Groups simulations by…
None	Nothing — all rows shown together
Run	The run each simulation belongs to
Agent	The agent that ran the simulation
Mutation	The agent mutation applied
Persona	The persona (simulated user)
Test case	The specific test case input
Metadata	A custom metadata key you choose
+ Create dimension	A custom dimension you define — group runs however you like

The mechanic is the same for each option; only the dimension changes. Compare by Agent to see where one build differs from another, by Persona to see how different users affect results, by Metadata to group on a key you set at launch (such as environment, version, or region), or create a custom dimension to group runs yourself — for example, grouping several builds as “v1” and “v2”. You can also add a secondary Compare By to nest one dimension inside another — for example, group by Agent, then by Persona within each agent.

Reading the comparison

Row vs. grouped view — Row view (default) shows each simulation as its own row, color-coded by group. Grouped view collapses each group into one aggregated row; click a group to expand it. Aggregation — In grouped view, choose how each group is summarized: Average, Median, P95, Min, or Max. P95 and Min are useful for understanding worst-case results. Focus on one metric — Click a metric card to filter the table to that metric; click All Metrics to return. Click Save Report to store the runs and view configuration; rename it with the pencil icon. To share, open the report and click Share → Publish shareable link to generate a public URL that can be viewed without an account. Published reports show a Public badge, and Unpublish all revokes access. To remove a report, use the three-dot menu → Delete; this does not delete the underlying runs.

Links copied from the Reports list won’t open until the report is published.

​Creating a report

​Designing the comparison

​Compare By

​Reading the comparison

​Saving, sharing, and deleting

Creating a report

Designing the comparison

Compare By

Reading the comparison

Saving, sharing, and deleting