Text to Speech

Generate expressive, studio-quality speech from text — buffered or streamed.

The /speech endpoint turns text into natural speech. Pick a model, a voice, and an output format; receive audio back as a binary stream.

Basic request

curl https://api.vocenza.com/v1/speech \
  -H "Authorization: Bearer $VOCENZA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vocenza-tts-1",
    "voice": "aria",
    "input": "The quick brown fox jumps over the lazy dog.",
    "format": "mp3"
  }' --output speech.mp3

Parameters

ParameterTypeDescription
modelstringTTS model id. vocenza-tts-1 (quality) or vocenza-tts-1-flash (lowest latency).
voicestringVoice id, e.g. aria, atlas, nova. See GET /voices.
inputstringThe text to synthesize. Up to 5,000 characters per request.
formatstringmp3, wav, opus, or pcm16. Defaults to mp3.
speednumberPlayback rate from 0.5 to 2.0. Defaults to 1.0.
sample_ratenumberOutput sample rate in Hz (e.g. 24000).

Streaming

For the lowest time-to-first-audio, stream the response and play chunks as they arrive instead of waiting for the full file.

stream.ts
const res = await fetch("https://api.vocenza.com/v1/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${process.env.VOCENZA_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "vocenza-tts-1-flash",
    voice: "aria",
    input: "Streaming keeps latency low for live experiences.",
    format: "pcm16",
  }),
});
 
const reader = res.body!.getReader();
for (;;) {
  const { done, value } = await reader.read();
  if (done) break;
  enqueueToAudioOutput(value); // your playback buffer
}

Pick the right model

Use vocenza-tts-1-flash for interactive, latency-sensitive playback and vocenza-tts-1 when you can buffer and want maximum fidelity.

Pronunciation control

Wrap text in SSML-style tags to fine-tune delivery:

<break time="400ms"/> Let me think about that.
<emphasis level="strong">Absolutely.</emphasis>

Output formats

FormatContainerBest for
mp3MPEGGeneral playback, small files
wavRIFFEditing, archival
opusOggStreaming over the network
pcm16rawRealtime playback buffers