0 models · 0 live · 0 quantization
Models that served real traffic on our endpoint in the last 7 days
We don't quietly swap models, quantize, or cache outputs. Public evals run daily, every Q&A is logged, and you can reproduce any number from your own terminal.
| Model | Precision | Quality | TTFT | Throughput | Price | View |
|---|
Methodology
How the three columns are derived, where the raw data lives, and how to reproduce it yourself.
Three-way validation
Each quality score appears alongside two third-party references: the figure published by the model authors and the score from Artificial Analysis. All three matching is the strongest signal that no silent swap or quantization is happening.
Raw logs are public
Every prompt, completion, logprob and judgement from every eval run is committed to a public GitHub repository, day by day, kept indefinitely. Anyone can audit, diff, or dispute.
Reproducible from your terminal
We use the standard lm-evaluation-harness. Same tool, same task set, same temperature: a score you measure against our endpoint should match what we publish.