API Documentation

Learn how to use the Inference Space API to build AI applications.

Quick Start

Base URL

https://api.inference.space/v1

OpenAI SDK compatible — just change the base URL and API key.

curl

curl https://api.inference.space/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Authentication

All API requests require your API key in the Authorization header.

Authorization: Bearer sk-your-api-key

Security Note

Never expose your API key in client-side code. Use environment variables and server-side requests.

List Models

Get a list of all available models.

GET /v1/models

curl https://api.inference.space/v1/models \
  -H "Authorization: Bearer sk-your-api-key"

Response

{
  "object": "list",
  "data": [
    {
      "id": "deepseek-chat",
      "object": "model",
      "owned_by": "deepseek"
    },
    {
      "id": "gpt-4o",
      "object": "model",
      "owned_by": "openai"
    }
  ]
}

Chat Completions

Send messages and get AI responses, compatible with OpenAI API format.

POST /v1/chat/completions

{
  "model": "deepseek-chat",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the capital of France?"}
  ],
  "temperature": 0.7,
  "max_tokens": 1024,
  "stream": false
}

Response

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 8,
    "total_tokens": 33
  }
}

Streaming

Set stream: true to receive streaming responses.

Streaming Request

{
  "model": "deepseek-chat",
  "messages": [{"role": "user", "content": "Tell me a joke"}],
  "stream": true
}

Stream Events

data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Why"},"index":0}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":" did"},"index":0}]}

data: [DONE]

Error Codes

HTTP Status	Error	Description
401	Unauthorized	Invalid or missing API key
402	Payment Required	Insufficient credits for paid models
429	Too Many Requests	Rate limit exceeded (RPM or RPD)
400	Bad Request	Invalid request body or parameters
500	Internal Error	Upstream provider error or server issue

Rate Limits

Default limits per API key: 20 RPM / 50 RPD. Rate limit info is included in response headers.

Header	Description
X-RateLimit-Remaining-RPM	Remaining requests per minute
X-RateLimit-Remaining-RPD	Remaining requests per day
Retry-After	Seconds to wait before retrying (on 429)

Code Examples

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.inference.space/v1"
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

JavaScript (Node.js)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'sk-your-api-key',
  baseURL: 'https://api.inference.space/v1',
});

const response = await client.chat.completions.create({
  model: 'deepseek-chat',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);

Python (Streaming)

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.inference.space/v1"
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Tell me a story"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

curl

curl https://api.inference.space/v1/chat/completions \
  -H "Authorization: Bearer sk-your-api-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 512
  }'