Inference Space
All quality scoresEvaluated N/A

Claude Sonnet 4.6

Claude

Evaluation

This model has not been evaluated yet. Its score will appear here once an evaluation run completes.

Methodology

Each model is run against a fixed suite of curated cases. A case passes when the response meets its rubric; the score aggregates pass outcomes across the suite. Scores are recomputed when the suite or the deployed model changes.