Quality
Evaluation correctness for every served model — benchmark scores and pass rates, not latency.
| Model | Type | Score | Pass rate | Detail |
|---|---|---|---|---|
| Claude Haiku 4.5Claude | Language | awaiting evaluation | — | |
| Claude Opus 4.8Claude | Language | awaiting evaluation | — | |
| Claude Sonnet 4.6Claude | Language | awaiting evaluation | — | |
| GPT Image 2GPT Image 2 | Image | awaiting evaluation | — | |
| GPT-5.3 CodexGPT | Language | awaiting evaluation | — | |
| GPT-5.4GPT | Language | awaiting evaluation | — | |
| GPT-5.4 MiniGPT | Language | awaiting evaluation | — | |
| GPT-5.5GPT | Language | awaiting evaluation | — |