LLM Router

Send a prompt and let Vocenza route it to the best model for cost, speed, and quality.

The LLM Router accepts a standard chat request and automatically selects the model that best fits your constraints — so you don't hard-code a single provider.

Route a request

curl https://api.vocenza.com/v1/chat/completions \
  -H "Authorization: Bearer $VOCENZA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "router/auto",
    "messages": [
      { "role": "user", "content": "Summarize this call transcript in 3 bullets." }
    ],
    "routing": { "optimize": "latency", "max_cost_per_1k": 0.5 }
  }'

The response includes the model that was chosen so routing stays transparent:

{
  "model": "router/auto",
  "resolved_model": "fast-mini-7b",
  "choices": [{ "message": { "role": "assistant", "content": "• …" } }]
}

Routing options

FieldTypeDescription
optimizestringlatency, cost, or quality. Defaults to quality.
max_cost_per_1knumberSkip models above this $/1K-token ceiling.
requirestring[]Capabilities the model must support, e.g. ["json", "tools"].
fallbacksstring[]Explicit models to try if the primary pick fails.

Why route?

One API, many models

Routing decouples your app from any single provider. When a faster or cheaper model ships, the router adopts it automatically — no code change, no redeploy.

Streaming

Set "stream": true to receive server-sent events. The router commits to a model before the first token, so resolved_model arrives in the opening chunk.