Inference Space

0 models · 0 live · 0 quantization

0

Models that served real traffic on our endpoint in the last 7 days

We don't quietly swap models, quantize, or cache outputs. Public evals run daily, every Q&A is logged, and you can reproduce any number from your own terminal.

Updated dailyRaw logs on GitHubVerify in your terminal
ModelPrecisionQualityTTFTThroughputPriceView

Methodology

How the three columns are derived, where the raw data lives, and how to reproduce it yourself.

01

Three-way validation

Each quality score appears alongside two third-party references: the figure published by the model authors and the score from Artificial Analysis. All three matching is the strongest signal that no silent swap or quantization is happening.

02

Raw logs are public

Every prompt, completion, logprob and judgement from every eval run is committed to a public GitHub repository, day by day, kept indefinitely. Anyone can audit, diff, or dispute.

03

Reproducible from your terminal

We use the standard lm-evaluation-harness. Same tool, same task set, same temperature: a score you measure against our endpoint should match what we publish.