Text-to-Speech API
Convert text to natural-sounding speech using 2,700+ pre-trained voices.
Text-to-Speech API
Convert text into natural-sounding speech using over 2,700 pre-trained voices.
Endpoint
Authentication
Request
Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer YOUR_API_KEY |
Content-Type | Yes | text/plain for plain text, application/ssml+xml for SSML |
X-Voice-ID | No | Voice slug (e.g., jenny-en-us). Defaults to Jenny (English US). |
X-Store-Audio | No | true to store audio and receive a URL instead of binary. Default: false. |
Body
The request body is the text to convert. Send as plain text or SSML markup.
Plain text:
SSML:
Limits
- Maximum text length: 25,000 characters per request.
- Texts longer than the provider's chunk limit are automatically split at sentence boundaries and processed in parallel.
Response
Binary Audio (default)
When X-Store-Audio is not set or false, the response is raw audio binary.
Response Headers:
| Header | Description |
|---|---|
X-Characters-Processed | Total characters processed. |
X-Chunks-Processed | Number of chunks (for split texts). |
X-Response-Time-Ms | Processing time in milliseconds. |
X-Cost-Cents | Cost in cents. |
X-Balance-Cents | Remaining balance in cents. |
Stored Audio (X-Store-Audio: true)
Returns a JSON object with a URL to the stored audio:
Pricing
- $0.025 per 1,000 characters ($25 per 1 million characters).
- Cost is calculated per character and rounded up.
- Example: A 1,500-character request costs approximately $0.04.
Supported Audio Formats
| Format | Content Type |
|---|---|
| MP3 | audio/mpeg |
| WAV | audio/wav |
| OGG | audio/ogg |
Default output format is MP3.
SSML Support
Set Content-Type: application/ssml+xml to use SSML.
| Tag | Description | Example |
|---|---|---|
<break> | Insert a pause | <break time="500ms"/> |
<prosody> | Control rate, pitch, volume | <prosody rate="slow">Slow speech</prosody> |
<emphasis> | Add emphasis | <emphasis level="strong">Important</emphasis> |
<say-as> | Control interpretation | <say-as interpret-as="date">2024-01-15</say-as> |
<phoneme> | Specify pronunciation | <phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme> |
Automatic Chunking
For texts longer than a provider's processing limit (typically 2,500–4,500 characters), Verbatik automatically:
- Splits text at sentence boundaries to preserve natural speech flow.
- Processes chunks in parallel for faster generation.
- Concatenates audio chunks into a single output.
This is transparent — you send the full text and receive a single audio response.
Examples
Basic TTS
Store Audio and Get URL
SSML Request
Error Responses
| Status | Error | Description |
|---|---|---|
| 400 | Request body is required | No text provided. |
| 400 | Text exceeds maximum length | Text exceeds 25,000 characters. |
| 401 | Invalid or missing API token | API key is missing, invalid, or expired. |
| 402 | Insufficient balance | Workspace balance too low. Top up your account. |
| 429 | Rate limit exceeded | Too many requests. |
| 500 | Internal server error | Unexpected error. Contact support if it persists. |