Realtime API
Full-duplex, low-latency voice conversations over a single websocket, with barge-in.
The Realtime API powers natural, two-way voice conversations. You stream the user's microphone audio up and receive synthesized speech and transcripts down — all over one websocket, with interruption (barge-in) handled for you.
Connect
Authenticate with an ephemeral token when connecting from a browser.
const ws = new WebSocket(
"wss://api.vocenza.com/v1/realtime?model=vocenza-realtime-1",
["vocenza", `token.${ephemeralToken}`],
);
ws.onopen = () => {
ws.send(
JSON.stringify({
type: "session.update",
session: {
voice: "aria",
instructions: "You are a friendly support agent. Keep replies short.",
input_audio_format: "pcm16",
output_audio_format: "pcm16",
},
}),
);
};Send and receive
Send microphone frames as you capture them, then read events as the model responds:
// Up: append captured PCM16 frames
ws.send(JSON.stringify({ type: "input_audio.append", audio: base64Frame }));
// Down: handle streamed events
ws.onmessage = (e) => {
const event = JSON.parse(e.data);
switch (event.type) {
case "transcript.delta":
appendCaption(event.text);
break;
case "audio.delta":
playPcm(event.audio); // base64 PCM16 chunk
break;
case "response.done":
markTurnComplete();
break;
}
};Event types
| Event | Direction | Description |
|---|---|---|
session.update | client → server | Set voice, instructions, and audio formats. |
input_audio.append | client → server | Append a chunk of captured audio. |
input_audio.commit | client → server | Signal end of the user's turn. |
transcript.delta | server → client | Incremental transcript of either party. |
audio.delta | server → client | A chunk of synthesized output audio. |
response.done | server → client | The assistant finished its turn. |
error | server → client | Something went wrong (see Errors). |
Barge-in is automatic
If the user starts talking while the assistant is speaking, Vocenza stops the
current response and starts listening. You'll receive a response.canceled
event so you can flush your playback buffer.
Use ephemeral tokens in the browser
Never open a realtime connection with a secret key from client-side code. Mint a short-lived token on your server — see Authentication.