Metrics Reference
Complete reference for all Prometheus metrics emitted by PromptKit’s unified metrics.Collector.
Overview
Section titled “Overview”PromptKit emits two categories of metrics through a single metrics.Collector:
- Pipeline metrics — operational metrics recorded automatically from EventBus events (provider calls, tool calls, pipeline duration, validation checks)
- Eval metrics — quality metrics recorded from pack-defined
EvalDef.Metricdefinitions
All metrics share a common label structure:
{namespace}_{metric_name}{const_labels, instance_labels, event_labels}Eval metrics use a separate sub-namespace to distinguish them from pipeline metrics:
{namespace}_eval_{metric_name}{const_labels, instance_labels, pack_labels}Where:
- Namespace — configurable prefix (default:
promptkit) - Const labels — process-level labels baked into the metric descriptor (
env,region) - Instance labels — per-conversation labels bound via
Bind()(tenant,prompt_name) - Event labels — per-observation labels specific to each metric (listed in the tables below)
Pipeline Metrics
Section titled “Pipeline Metrics”These are registered at Collector creation time and recorded automatically when MetricContext.OnEvent is wired to the EventBus. Disable with DisablePipelineMetrics: true or use NewEvalOnlyCollector().
Pipeline Duration
Section titled “Pipeline Duration”| Metric | Type | Event Labels | Description |
|---|---|---|---|
{ns}_pipeline_duration_seconds | Histogram | status | Total pipeline execution duration |
status values: success, error
Buckets: 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30, 60, 120 seconds
Source events: pipeline.completed, pipeline.failed
Provider Metrics
Section titled “Provider Metrics”| Metric | Type | Event Labels | Description |
|---|---|---|---|
{ns}_provider_request_duration_seconds | Histogram | provider, model | LLM API call duration |
{ns}_provider_requests_total | Counter | provider, model, status | Total provider API calls |
{ns}_provider_input_tokens_total | Counter | provider, model | Input tokens sent to provider |
{ns}_provider_output_tokens_total | Counter | provider, model | Output tokens received from provider |
{ns}_provider_cached_tokens_total | Counter | provider, model | Cached tokens in provider calls |
{ns}_provider_cost_total | Counter | provider, model | Total cost in USD |
status values: success, error
Duration buckets: 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30, 60 seconds
Source events: provider.call.completed, provider.call.failed
Notes:
- Token metrics are only incremented when the count is > 0
- Cost is only incremented when > 0
- Duration is recorded on both success and failure
Tool Call Metrics
Section titled “Tool Call Metrics”| Metric | Type | Event Labels | Description |
|---|---|---|---|
{ns}_tool_call_duration_seconds | Histogram | tool | Tool call execution duration |
{ns}_tool_calls_total | Counter | tool, status | Total tool call count |
status values: success, error
Buckets: 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10 seconds
Source events: tool.call.completed, tool.call.failed
Validation Metrics
Section titled “Validation Metrics”| Metric | Type | Event Labels | Description |
|---|---|---|---|
{ns}_validation_duration_seconds | Histogram | validator, validator_type | Validation check duration |
{ns}_validations_total | Counter | validator, validator_type, status | Validation results |
status values: passed, failed
Buckets: 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1 seconds
Source events: validation.passed, validation.failed
Eval Metrics
Section titled “Eval Metrics”Eval metrics are registered dynamically on first observation (with double-checked locking for thread safety). Disable with DisableEvalMetrics: true.
Every eval that runs produces a Prometheus metric. If the eval definition includes an explicit metric field, that definition is used. If no metric field is present, a default gauge metric is auto-generated using the eval ID as the metric name (e.g., eval "response-quality" becomes {ns}_eval_response-quality). This ensures pack authors don’t need to opt in to metrics — every eval result is observable by default.
MetricDef Structure
Section titled “MetricDef Structure”{ "id": "response_quality", "type": "llm_judge", "trigger": "every_turn", "metric": { "name": "response_quality_score", "type": "gauge", "labels": { "eval_type": "llm_judge", "category": "quality" } }}| Field | Required | Description |
|---|---|---|
name | Yes | Prometheus metric name (auto-prefixed with {namespace}_eval_ if not already) |
type | Yes | One of gauge, counter, histogram, boolean |
range | No | Value range hint (min, max) — used for documentation, not enforced |
labels | No | Static labels added to this metric (pack-author defined) |
Eval Metric Types
Section titled “Eval Metric Types”| Type | Prometheus Type | Behavior | Typical Use |
|---|---|---|---|
gauge | Gauge | Set to the eval’s score value | Relevance scores, quality ratings |
counter | Counter | Increment by 1 on each eval execution | Execution counts |
histogram | Histogram | Observe the score value | Score distributions |
boolean | Gauge | Set to 1.0 (pass) or 0.0 (fail) | Binary checks (JSON valid, contains keyword) |
Histogram buckets: Prometheus default buckets (0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10)
Eval Metric Label Ordering
Section titled “Eval Metric Label Ordering”The full label set for an eval metric is: instance labels (sorted) + pack-author labels (sorted by key).
For example, with InstanceLabels: ["tenant"] and metric.labels: {"category": "quality", "eval_type": "llm_judge"}:
myapp_eval_response_quality_score{tenant="acme",category="quality",eval_type="llm_judge"} 0.85Score Extraction
Section titled “Score Extraction”The score value recorded for gauge and histogram types is extracted by ExtractValue as follows:
- If
EvalResult.MetricValueis non-nil, use*MetricValue - Otherwise, if
EvalResult.Scoreis non-nil, use*Score - Otherwise, default to
0.0
For boolean metrics, the value is 1.0 if Score >= 1.0, otherwise 0.0.
Label Architecture
Section titled “Label Architecture”Label Levels
Section titled “Label Levels”Labels come from three sources, applied in order:
| Level | Set At | Scope | Examples |
|---|---|---|---|
| Const labels | CollectorOpts.ConstLabels | Process-wide, baked into descriptor | env, region, service_name |
| Instance labels | Bind(map) / MetricsInstanceLabels | Per-conversation or per-invocation | tenant, prompt_name |
| Event labels | Per-observation (automatic) | Per-metric-observation | provider, model, status, tool |
Instance Label Ordering
Section titled “Instance Label Ordering”InstanceLabels are sorted alphabetically when the Collector is created. When calling Bind(), the map key order doesn’t matter — values are looked up by key name, not position.
// These produce identical results:collector.Bind(map[string]string{"z_tenant": "acme", "a_prompt": "chat"})collector.Bind(map[string]string{"a_prompt": "chat", "z_tenant": "acme"})API Quick Reference
Section titled “API Quick Reference”Conversation API (WithMetrics)
Section titled “Conversation API (WithMetrics)”reg := prometheus.NewRegistry()collector := metrics.NewCollector(metrics.CollectorOpts{ Registerer: reg, Namespace: "myapp", ConstLabels: prometheus.Labels{"env": "prod"}, InstanceLabels: []string{"tenant"},})
conv, _ := sdk.Open("./app.pack.json", "chat", sdk.WithMetrics(collector, map[string]string{"tenant": "acme"}),)Records both pipeline and eval metrics automatically.
Standalone Eval API (sdk.Evaluate)
Section titled “Standalone Eval API (sdk.Evaluate)”collector := metrics.NewEvalOnlyCollector(metrics.CollectorOpts{ Registerer: reg, Namespace: "myapp", InstanceLabels: []string{"tenant"},})
results, _ := sdk.Evaluate(ctx, sdk.EvaluateOpts{ PackPath: "./app.pack.json", Messages: messages, MetricsCollector: collector, MetricsInstanceLabels: map[string]string{"tenant": "acme"},})CollectorOpts
Section titled “CollectorOpts”| Field | Type | Default | Description |
|---|---|---|---|
Registerer | prometheus.Registerer | DefaultRegisterer | Registry to register into |
Namespace | string | "promptkit" | Metric name prefix |
ConstLabels | prometheus.Labels | nil | Process-level constant labels |
InstanceLabels | []string | nil | Label names that vary per conversation (sorted internally) |
DisablePipelineMetrics | bool | false | Skip pipeline metric registration |
DisableEvalMetrics | bool | false | Skip eval metric recording |
Constructors
Section titled “Constructors”| Function | Description |
|---|---|
NewCollector(opts) | Full collector — pipeline + eval metrics |
NewEvalOnlyCollector(opts) | Eval metrics only (DisablePipelineMetrics: true) |
Complete Metric Catalog
Section titled “Complete Metric Catalog”For quick reference, here is every metric name emitted with the default promptkit namespace:
| Metric Name | Type | Category |
|---|---|---|
promptkit_pipeline_duration_seconds | Histogram | Pipeline |
promptkit_provider_request_duration_seconds | Histogram | Provider |
promptkit_provider_requests_total | Counter | Provider |
promptkit_provider_input_tokens_total | Counter | Provider |
promptkit_provider_output_tokens_total | Counter | Provider |
promptkit_provider_cached_tokens_total | Counter | Provider |
promptkit_provider_cost_total | Counter | Provider |
promptkit_tool_call_duration_seconds | Histogram | Tool |
promptkit_tool_calls_total | Counter | Tool |
promptkit_validation_duration_seconds | Histogram | Validation |
promptkit_validations_total | Counter | Validation |
{ns}_eval_{metric_name} | Varies | Eval (explicit pack-defined metric) |
{ns}_eval_{eval_id} | Gauge | Eval (auto-generated when no metric defined) |
Metric-to-Trace Correlation
Section titled “Metric-to-Trace Correlation”PromptKit metrics and traces are correlated through the session ID. The session ID (a UUID) appears as:
- Metrics: instance label (e.g.,
session_id="4e597ba3-92bf-47cf-84f3-29d3ece24456") - Traces:
gen_ai.conversation.idspan attribute - Events:
Event.SessionIDfield
The OTel trace ID equals the session ID with dashes removed (e.g., session 4e597ba3-92bf-47cf-84f3-29d3ece24456 → trace ID 4e597ba392bf47cf84f329d3ece24456), so a single session ID query correlates logs, metrics, and traces.
Prometheus Exemplars
Section titled “Prometheus Exemplars”PromptKit’s built-in Collector does not attach Prometheus exemplars to observations. This is intentional — exemplar configuration (trace ID format, label keys, sampling) is an operator concern.
Operators who want exemplar support (e.g., clicking from a Grafana metric panel to a specific trace in Tempo) can subscribe their own listener to the EventBus and record metrics with exemplars:
bus.SubscribeAll(func(event *events.Event) { if event.Type != events.EventPipelineCompleted { return } data := event.Data.(*events.PipelineCompletedData)
// Derive trace ID from session ID (remove dashes). traceID := strings.ReplaceAll(event.SessionID, "-", "")
// Record with exemplar for Grafana → Tempo linking. hist, _ := pipelineDuration.GetMetricWithLabelValues("success") hist.(prometheus.ExemplarObserver).ObserveWithExemplar( data.Duration.Seconds(), prometheus.Labels{"trace_id": traceID}, )})This approach gives operators full control over which metrics carry exemplars and how trace IDs are derived.
See Also
Section titled “See Also”- Prometheus Metrics How-To — Setup guide with Grafana dashboard and alerts
- Monitor Events — EventBus hooks and metrics in the SDK
- Run Evals — Standalone eval execution with metrics
- Eval Framework — Eval architecture and metric definitions
- Telemetry Reference — OpenTelemetry tracing (complementary to metrics)