Inference Space

Quality

Evaluation correctness for every served model — benchmark scores and pass rates, not latency.

ModelTypeScorePass rateDetail
Claude Haiku 4.5ClaudeLanguageawaiting evaluation
Claude Opus 4.8ClaudeLanguageawaiting evaluation
Claude Sonnet 4.6ClaudeLanguageawaiting evaluation
GPT Image 2GPT Image 2Imageawaiting evaluation
GPT-5.3 CodexGPTLanguageawaiting evaluation
GPT-5.4GPTLanguageawaiting evaluation
GPT-5.4 MiniGPTLanguageawaiting evaluation
GPT-5.5GPTLanguageawaiting evaluation