Inference Space

API Documentation

Learn how to use the Inference Space API to build AI applications.

Quick Start

Base URL

https://api.inference.space/v1

OpenAI SDK compatible — just change the base URL and API key.

curl
curl https://api.inference.space/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Authentication

All API requests require your API key in the Authorization header.

Authorization: Bearer sk-your-api-key

Security Note

Never expose your API key in client-side code. Use environment variables and server-side requests.

List Models

Get a list of all available models.

GET /v1/models
curl https://api.inference.space/v1/models \
  -H "Authorization: Bearer sk-your-api-key"
Response
{
  "object": "list",
  "data": [
    {
      "id": "deepseek-chat",
      "object": "model",
      "owned_by": "deepseek"
    },
    {
      "id": "gpt-4o",
      "object": "model",
      "owned_by": "openai"
    }
  ]
}

Chat Completions

Send messages and get AI responses, compatible with OpenAI API format.

POST /v1/chat/completions
{
  "model": "deepseek-chat",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}
Response
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming

Set stream: true to receive streaming responses.

Streaming Request
{
  "model": "deepseek-chat",
  "messages": [{"role": "user", "content": "Tell me a joke"}],
  "stream": true
}
Stream Events
data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Why"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" did"},"index":0}]}

data: [DONE]

Error Codes

HTTP StatusErrorDescription
401UnauthorizedInvalid or missing API key
402Payment RequiredInsufficient credits for paid models
429Too Many RequestsRate limit exceeded (RPM or RPD)
400Bad RequestInvalid request body or parameters
500Internal ErrorUpstream provider error or server issue

Rate Limits

Default limits per API key: 20 RPM / 50 RPD. Rate limit info is included in response headers.

HeaderDescription
X-RateLimit-Remaining-RPMRemaining requests per minute
X-RateLimit-Remaining-RPDRemaining requests per day
Retry-AfterSeconds to wait before retrying (on 429)

Code Examples

Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.inference.space/v1"
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)
JavaScript (Node.js)
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.inference.space/v1',
});

const response = await client.chat.completions.create({
  model: 'deepseek-chat',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);
Python (Streaming)
from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.inference.space/v1"
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)
curl
curl https://api.inference.space/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'