# Inworld AI

> Complete API reference: https://docs.inworld.ai/llms-full.txt

Inworld is a research lab focused on realtime voice AI. We build the #1 ranked models and APIs: text-to-speech, speech-to-text, intelligent LLM routing, and a Realtime API for end-to-end voice conversations. Most trusted by serious developers building voice-first applications that make every user feel understood.

## Products

- [TTS API (Text-to-Speech)](https://inworld.ai/tts): #1 ranked on Artificial Analysis TTS Arena. Low-latency streaming TTS with word, phoneme, and viseme timestamps for lipsync. Supports emotion markup, voice cloning from 15 seconds of audio, and 15 production-quality languages. Models: inworld-tts-1.5-max, inworld-tts-1.5-mini.
- [STT API (Speech-to-Text)](https://inworld.ai/speech-to-text): Multi-provider transcription with voice profiling (emotion, accent, intent detection). 99+ languages via Whisper. Research Preview.
- [Router API](https://inworld.ai/router): OpenAI Chat Completions-compatible API that routes to 200+ LLM models (OpenAI, Anthropic, Google, open-source). Single endpoint, single API key, automatic fallback and cost optimization. Free research preview.
- [Realtime API](https://inworld.ai/realtime-api): End-to-end voice pipeline combining STT + LLM + TTS in a single session. WebSocket and WebRTC transports for real-time conversational AI.

## API Reference

### Authentication

All endpoints use HTTP Basic authentication:

```
Authorization: Basic {YOUR_API_KEY}
```

### TTS REST — Single Request

```
POST https://api.inworld.ai/tts/v1/voice
Content-Type: application/json
Authorization: Basic {YOUR_API_KEY}

{
  "text": "Hello, I am Sarah.",
  "voiceId": "Sarah",
  "modelId": "inworld-tts-1.5-max"
}
```

Returns JSON with base64-encoded audioContent: `{"audioContent": "base64..."}`

### TTS Streaming

```
POST https://api.inworld.ai/tts/v1/voice:stream
Content-Type: application/json
Authorization: Basic {YOUR_API_KEY}

{
  "text": "Hello, I am Sarah.",
  "voiceId": "Sarah",
  "modelId": "inworld-tts-1.5-max"
}
```

Returns a stream of JSON objects. Each line contains a JSON object with `result.audioContent` (base64-encoded audio):

```json
{"result":{"audioContent":"base64-encoded-audio-chunk..."}}
{"result":{"audioContent":"base64-encoded-audio-chunk..."}}
```

### List Voices

```
GET https://api.inworld.ai/voices/v1/voices
Authorization: Basic {YOUR_API_KEY}
```

Returns 271 available voices.

### Voice Cloning

```
POST https://api.inworld.ai/voices/v1/voices:clone
Authorization: Basic {YOUR_API_KEY}
Content-Type: application/json

{
  "displayName": "MyClonedVoice",
  "langCode": "EN_US",
  "voiceSamples": [{"audioData": "base64-encoded-audio"}]
}
```

### Speech-to-Text

```
POST https://api.inworld.ai/stt/v1/transcribe
Authorization: Basic {YOUR_API_KEY}
```

### Router (LLM)

OpenAI Chat Completions-compatible. Routes to 200+ models.

```
POST https://api.inworld.ai/v1/chat/completions
Content-Type: application/json
Authorization: Basic {YOUR_API_KEY}

{
  "model": "gpt-5.4",
  "messages": [
    {"role": "user", "content": "Hello"}
  ]
}
```

### Realtime API

**WebSocket:**
```
wss://api.inworld.ai/api/v1/realtime/session
```

**WebRTC:**
```
POST https://api.inworld.ai/v1/realtime/calls
```

Combines STT + LLM + TTS in a single persistent session for real-time voice conversations.

## Key Specifications

- **TTS Models**: inworld-tts-1.5-max, inworld-tts-1.5-mini
- **Default Voice**: Sarah
- **TTS Latency**: P90 sub-130ms (Mini), P90 sub-200ms (Max)
- **TTS Pricing**: See https://inworld.ai/pricing
- **STT Pricing**: See https://inworld.ai/pricing
- **Languages**: 15 (optimized for production quality)
- **Voice Cloning**: Single API call with 15 seconds of reference audio
- **Timestamp Data**: Word-level, phoneme-level, and viseme-level for real-time lipsync animation
- **Emotion Support**: Anger, joy, sadness, fear, disgust, surprise via audio markup tags
- **Deployment**: Cloud API + on-premise deployment
- **Router Models**: 200+ models from OpenAI, Anthropic, Google, Meta, Mistral, and more
- **Router Pricing**: Free research preview

## Rankings & Benchmarks

- #1 on Artificial Analysis TTS Arena
- Most trusted voice AI for serious developers

## Use Cases

- Conversational AI agents and voice bots
- AI companions and interactive entertainment
- Language learning applications
- Enterprise voice assistants and support
- Consumer apps with realtime voice

## Quick Start (Python)

```python
import requests
import base64
import json

# REST TTS
response = requests.post(
    "https://api.inworld.ai/tts/v1/voice",
    headers={"Authorization": "Basic YOUR_API_KEY"},
    json={
        "text": "Hello, I am Sarah.",
        "voiceId": "Sarah",
        "modelId": "inworld-tts-1.5-max"
    }
)
audio = base64.b64decode(response.json()["audioContent"])

# Streaming TTS

response = requests.post(
    "https://api.inworld.ai/tts/v1/voice:stream",
    headers={"Authorization": "Basic YOUR_API_KEY"},
    json={
        "text": "Hello, I am Sarah.",
        "voiceId": "Sarah",
        "modelId": "inworld-tts-1.5-max"
    },
    stream=True
)
for line in response.iter_lines():
    if line:
        chunk = json.loads(line)
        audio_b64 = chunk["result"]["audioContent"]

# Router (OpenAI-compatible)
response = requests.post(
    "https://api.inworld.ai/v1/chat/completions",
    headers={"Authorization": "Basic YOUR_API_KEY"},
    json={
        "model": "gpt-5.4",
        "messages": [{"role": "user", "content": "Hello"}]
    }
)
```

## Quick Start (JavaScript)

```javascript
// REST TTS
const response = await fetch('https://api.inworld.ai/tts/v1/voice', {
  method: 'POST',
  headers: {
    'Authorization': 'Basic YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Hello, I am Sarah.',
    voiceId: 'Sarah',
    modelId: 'inworld-tts-1.5-max'
  })
});
const data = await response.json();
const audioBytes = Uint8Array.from(atob(data.audioContent), c => c.charCodeAt(0));

// Streaming TTS
const stream = await fetch('https://api.inworld.ai/tts/v1/voice:stream', {
  method: 'POST',
  headers: {
    'Authorization': 'Basic YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    text: 'Hello, I am Sarah.',
    voiceId: 'Sarah',
    modelId: 'inworld-tts-1.5-max'
  })
});
const reader = stream.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const lines = decoder.decode(value).split('\n').filter(Boolean);
  for (const line of lines) {
    const chunk = JSON.parse(line);
    const audioB64 = chunk.result.audioContent;
  }
}
```

## Documentation

- [Docs Home](https://docs.inworld.ai/introduction)
- [TTS (Text-to-Speech)](https://docs.inworld.ai/tts/tts)
- [STT (Speech-to-Text)](https://docs.inworld.ai/stt/overview)
- [Realtime API](https://docs.inworld.ai/realtime/overview)
- [LLM Router](https://docs.inworld.ai/router/introduction)
- [GitHub Organization](https://github.com/inworld-ai)

## Machine-Readable Data

- [Models JSON](https://inworld.ai/models.json): Machine-readable list of all LLM models available through Inworld Router.
- [agents.json](https://inworld.ai/.well-known/agents.json): Machine-readable agent capabilities description.

## Company

- **Website**: https://inworld.ai
- **Documentation**: https://docs.inworld.ai
- **GitHub**: https://github.com/inworld-ai
- **Founded**: 2021
- **Focus**: Research lab building realtime voice AI infrastructure