Text-to-Speech API

Convert text into natural-sounding speech using over 2,700 pre-trained voices.

Endpoint

POST /api/v1/tts

Authentication

Authorization: Bearer YOUR_API_KEY

Request

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer YOUR_API_KEY`
`Content-Type`	Yes	`text/plain` for plain text, `application/ssml+xml` for SSML
`X-Voice-ID`	No	Voice slug (e.g., `jenny-en-us`). Defaults to Jenny (English US).
`X-Store-Audio`	No	`true` to store audio and receive a URL instead of binary. Default: `false`.

Body

The request body is the text to convert. Send as plain text or SSML markup.

Plain text:

Hello, welcome to Verbatik! This is a demonstration of our text-to-speech API.

SSML:

<speak>
  Hello, <break time="500ms"/> welcome to Verbatik!
  <prosody rate="slow">This is spoken slowly.</prosody>
</speak>

Limits

Maximum text length: 25,000 characters per request.
Texts longer than the provider's chunk limit are automatically split at sentence boundaries and processed in parallel.

Response

Binary Audio (default)

When X-Store-Audio is not set or false, the response is raw audio binary.

Response Headers:

Header	Description
`X-Characters-Processed`	Total characters processed.
`X-Chunks-Processed`	Number of chunks (for split texts).
`X-Response-Time-Ms`	Processing time in milliseconds.
`X-Cost-Cents`	Cost in cents.
`X-Balance-Cents`	Remaining balance in cents.

Stored Audio (X-Store-Audio: true)

Returns a JSON object with a URL to the stored audio:

{
  "success": true,
  "audio_url": "https://storage.verbatik.com/audio/abc123.mp3",
  "characters_processed": 150,
  "chunks_processed": 1,
  "cost_cents": 4,
  "balance_cents": 1996,
  "response_time_ms": 1200
}

Pricing

$0.025 per 1,000 characters ($25 per 1 million characters).
Cost is calculated per character and rounded up.
Example: A 1,500-character request costs approximately $0.04.

Supported Audio Formats

Format	Content Type
MP3	`audio/mpeg`
WAV	`audio/wav`
OGG	`audio/ogg`

Default output format is MP3.

SSML Support

Set Content-Type: application/ssml+xml to use SSML.

Tag	Description	Example
`<break>`	Insert a pause	`<break time="500ms"/>`
`<prosody>`	Control rate, pitch, volume	`<prosody rate="slow">Slow speech</prosody>`
`<emphasis>`	Add emphasis	`<emphasis level="strong">Important</emphasis>`
`<say-as>`	Control interpretation	`<say-as interpret-as="date">2024-01-15</say-as>`
`<phoneme>`	Specify pronunciation	`<phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme>`

Automatic Chunking

For texts longer than a provider's processing limit (typically 2,500–4,500 characters), Verbatik automatically:

Splits text at sentence boundaries to preserve natural speech flow.
Processes chunks in parallel for faster generation.
Concatenates audio chunks into a single output.

This is transparent — you send the full text and receive a single audio response.

Examples

Basic TTS

curl -X POST https://api.verbatik.com/api/v1/tts \
  -H "Authorization: Bearer vbt_your_api_key" \
  -H "Content-Type: text/plain" \
  -H "X-Voice-ID: jenny-en-us" \
  --data "Hello, this is a test of the Verbatik text-to-speech API." \
  --output output.mp3

Store Audio and Get URL

curl -X POST https://api.verbatik.com/api/v1/tts \
  -H "Authorization: Bearer vbt_your_api_key" \
  -H "Content-Type: text/plain" \
  -H "X-Voice-ID: aria-en-us" \
  -H "X-Store-Audio: true" \
  --data "This audio will be stored and a URL will be returned."

SSML Request

curl -X POST https://api.verbatik.com/api/v1/tts \
  -H "Authorization: Bearer vbt_your_api_key" \
  -H "Content-Type: application/ssml+xml" \
  -H "X-Voice-ID: jenny-en-us" \
  --data '<speak>Hello! <break time="1s"/> Welcome to Verbatik.</speak>' \
  --output output.mp3

Error Responses

Status	Error	Description
400	`Request body is required`	No text provided.
400	`Text exceeds maximum length`	Text exceeds 25,000 characters.
401	`Invalid or missing API token`	API key is missing, invalid, or expired.
402	`Insufficient balance`	Workspace balance too low. Top up your account.
429	`Rate limit exceeded`	Too many requests.
500	`Internal server error`	Unexpected error. Contact support if it persists.

Text-to-Speech API

On this page