gpt-4o · 7d · 4,218 traces
gpt-4o-prod
Q1 Eval Run
Views
Cluster Map
Code Explorer
Calibration
Trace Feed
Runs
2025-01-14
2025-01-07
2024-12-31
1,847 codes · 9 clusters
Cluster Map
4,218 traces
1,847 codes
9 clusters
All codes
Freq > 20
Outliers
Clusters
9
Confidence calibration
312
Deflection patterns
278
Over-hedging
241
Factual assertion
198
Instruction follow
187
Format deviation
143
Scope creep
119
Refusal cascade
94
Verbosity drift
88
+
−
Active cluster
Cluster
Code
UMAP-1
1.0
UMAP-2
format
deviation
143
over-hedging
241
deflection
278
verbosity
88
factual assert.
198
scope creep
119
refusal
94
instruction fol.
187
confidence
calibration
312 codes
−0.5
0.0
0.5
Confidence calibration
Codes
312
Judge avg
0.74
Consensus
0.61
Drift
+0.13 ↑