Text to Speech

Generate expressive, studio-quality speech from text — buffered or streamed.

The /speech endpoint turns text into natural speech. Pick a model, a voice, and an output format; receive audio back as a binary stream.

Basic request

curl https://api.vocenza.com/v1/speech \
  -H "Authorization: Bearer $VOCENZA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vocenza-tts-1",
    "voice": "aria",
    "input": "The quick brown fox jumps over the lazy dog.",
    "format": "mp3"
  }' --output speech.mp3

Parameters

Parameter	Type	Description
`model`	string	TTS model id. `vocenza-tts-1` (quality) or `vocenza-tts-1-flash` (lowest latency).
`voice`	string	Voice id, e.g. `aria`, `atlas`, `nova`. See `GET /voices`.
`input`	string	The text to synthesize. Up to 5,000 characters per request.
`format`	string	`mp3`, `wav`, `opus`, or `pcm16`. Defaults to `mp3`.
`speed`	number	Playback rate from `0.5` to `2.0`. Defaults to `1.0`.
`sample_rate`	number	Output sample rate in Hz (e.g. `24000`).

Streaming

For the lowest time-to-first-audio, stream the response and play chunks as they arrive instead of waiting for the full file.

stream.ts

const res = await fetch("https://api.vocenza.com/v1/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.VOCENZA_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "vocenza-tts-1-flash",
    voice: "aria",
    input: "Streaming keeps latency low for live experiences.",
    format: "pcm16",
  }),
});
 
const reader = res.body!.getReader();
for (;;) {
  const { done, value } = await reader.read();
  if (done) break;
  enqueueToAudioOutput(value); // your playback buffer
}

Pick the right model

Use vocenza-tts-1-flash for interactive, latency-sensitive playback and vocenza-tts-1 when you can buffer and want maximum fidelity.

Pronunciation control

Wrap text in SSML-style tags to fine-tune delivery:

<break time="400ms"/> Let me think about that.
<emphasis level="strong">Absolutely.</emphasis>

Output formats

Format	Container	Best for
`mp3`	MPEG	General playback, small files
`wav`	RIFF	Editing, archival
`opus`	Ogg	Streaming over the network
`pcm16`	raw	Realtime playback buffers

Basic request#

Parameters#

Streaming#

Pronunciation control#

Output formats#

Basic request

Parameters

Streaming

Pronunciation control

Output formats