Realtime API

Full-duplex, low-latency voice conversations over a single websocket, with barge-in.

The Realtime API powers natural, two-way voice conversations. You stream the user's microphone audio up and receive synthesized speech and transcripts down — all over one websocket, with interruption (barge-in) handled for you.

Connect

Authenticate with an ephemeral token when connecting from a browser.

client.ts
const ws = new WebSocket(
  "wss://api.vocenza.com/v1/realtime?model=vocenza-realtime-1",
  ["vocenza", `token.${ephemeralToken}`],
);
 
ws.onopen = () => {
  ws.send(
    JSON.stringify({
      type: "session.update",
      session: {
        voice: "aria",
        instructions: "You are a friendly support agent. Keep replies short.",
        input_audio_format: "pcm16",
        output_audio_format: "pcm16",
      },
    }),
  );
};

Send and receive

Send microphone frames as you capture them, then read events as the model responds:

// Up: append captured PCM16 frames
ws.send(JSON.stringify({ type: "input_audio.append", audio: base64Frame }));
 
// Down: handle streamed events
ws.onmessage = (e) => {
  const event = JSON.parse(e.data);
  switch (event.type) {
    case "transcript.delta":
      appendCaption(event.text);
      break;
    case "audio.delta":
      playPcm(event.audio); // base64 PCM16 chunk
      break;
    case "response.done":
      markTurnComplete();
      break;
  }
};

Event types

EventDirectionDescription
session.updateclient → serverSet voice, instructions, and audio formats.
input_audio.appendclient → serverAppend a chunk of captured audio.
input_audio.commitclient → serverSignal end of the user's turn.
transcript.deltaserver → clientIncremental transcript of either party.
audio.deltaserver → clientA chunk of synthesized output audio.
response.doneserver → clientThe assistant finished its turn.
errorserver → clientSomething went wrong (see Errors).

Barge-in is automatic

If the user starts talking while the assistant is speaking, Vocenza stops the current response and starts listening. You'll receive a response.canceled event so you can flush your playback buffer.

Use ephemeral tokens in the browser

Never open a realtime connection with a secret key from client-side code. Mint a short-lived token on your server — see Authentication.