Documentation Index
Fetch the complete documentation index at: https://docs.coval.ai/llms.txt
Use this file to discover all available pages before exploring further.
Walkthrough
Overview
Custom Trace Metrics let you extract a specific numerical value from your agent’s OpenTelemetry spans and aggregate it across all turns in a simulation.
Use Custom Trace Metrics when you have a signal already captured in your traces — latency measurements, confidence scores, token counts, retry attempts — that you want to track and trend across runs.
Prerequisites
Your agent must be instrumented with OpenTelemetry and sending spans to Coval. See the OpenTelemetry Traces guide for setup instructions. If traces are not present for a simulation, the metric will report an error at execution time.
Configuration
When creating a Custom Trace Metric, configure three fields:
| Field | Description |
|---|
| Span Name | The name of the OTel span to query (e.g. llm, tts, stt, llm_tool_call, or any custom span name you emit). |
| Metric Attribute | The span attribute to extract the value from (e.g. retrieval_latency_ms, confidence_score, or another custom numeric attribute key). |
| Aggregation Method | How to aggregate the extracted values across all matching spans in the simulation. |
Aggregation Methods
| Method | Description |
|---|
| Average | Mean value across all matching spans. Best for typical-case latency or scores. |
| Median | Median value across all matching spans. More robust to outliers than average. |
| p90 | 90th-percentile value. Best for understanding worst-case performance at scale. |
| p95 | 95th-percentile value. Useful for tail latency on larger samples. |
| p99 | 99th-percentile value. Useful for rare but severe latency spikes. |
| Max | Maximum value observed across all matching spans. Useful for worst-case detection. |
| Min | Minimum value observed across all matching spans. |
| Sum | Total value across all matching spans. Useful for token counts, cost-like counters, and accumulated work. |
| Count | Number of matching spans. Useful for tool calls, retries, fallbacks, handoffs, and critical events. |
| Error Rate | Percentage of matching spans with an error status. |
| Success Rate | Percentage of matching spans with a successful status. |
For count, error_rate, and success_rate, the metric can aggregate matching spans directly. For numeric aggregations such as average, p95, or sum, choose a numeric span attribute.
Span Names
Any span name your agent emits can be queried. The following well-known span names map to Coval’s built-in trace components:
| Span Name | Component |
|---|
llm | Language model invocations |
tts | Speech synthesis |
stt | Speech recognition |
llm_tool_call | Individual tool/function calls |
turn | A single conversation turn |
Custom span names (e.g. document_retrieval, database_lookup) work as well — use whatever names your agent emits.
How to Create
Open the Metrics page
Navigate to the Metrics section in the Coval dashboard.
Click Create Metric
Select Custom Trace Metrics from the metric type group.
Configure the metric
Fill in Span Name, Metric Attribute, and Aggregation Method for your use case.
Name and save
Give the metric a descriptive name and save. It is now available to add to any run.
Use Cases
Custom Latency Tracking
Extract average document retrieval latency from your custom retrieval spans:
| Field | Value |
|---|
| Span Name | document_retrieval |
| Metric Attribute | retrieval_latency_ms |
| Aggregation Method | Average |
This gives you the average retrieval latency across all turns in the simulation. Compare it across runs to catch regressions after changes to your index, embeddings, or chunking strategy.
p90 External API Latency
Track tail latency for an external service your agent depends on:
| Field | Value |
|---|
| Span Name | weather_api |
| Metric Attribute | duration_ms |
| Aggregation Method | p90 |
Use p90 instead of average when you care about tail performance instead of typical performance, especially for services that can occasionally spike.
If your agent emits custom spans for specific tool calls with a duration attribute:
| Field | Value |
|---|
| Span Name | database_lookup |
| Metric Attribute | duration_ms |
| Aggregation Method | Average |
If your agent records a confidence score on each language model span:
| Field | Value |
|---|
| Span Name | llm |
| Metric Attribute | confidence_score |
| Aggregation Method | Average |
Custom Trace Metrics complement built-in trace metrics like LLM Time to First Byte and TTS Time to First Byte. Use the built-in metrics for standard pipeline components and Custom Trace Metrics for signals specific to your agent’s instrumentation.
Want an AI-assisted setup? Use Tracing Skills to have your coding agent inspect real traces, recommend 3-6 useful metrics, and create only metrics backed by span data that exists.