LLM Router
Send a prompt and let Vocenza route it to the best model for cost, speed, and quality.
The LLM Router accepts a standard chat request and automatically selects the model that best fits your constraints — so you don't hard-code a single provider.
Route a request
curl https://api.vocenza.com/v1/chat/completions \
-H "Authorization: Bearer $VOCENZA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "router/auto",
"messages": [
{ "role": "user", "content": "Summarize this call transcript in 3 bullets." }
],
"routing": { "optimize": "latency", "max_cost_per_1k": 0.5 }
}'The response includes the model that was chosen so routing stays transparent:
{
"model": "router/auto",
"resolved_model": "fast-mini-7b",
"choices": [{ "message": { "role": "assistant", "content": "• …" } }]
}Routing options
| Field | Type | Description |
|---|---|---|
optimize | string | latency, cost, or quality. Defaults to quality. |
max_cost_per_1k | number | Skip models above this $/1K-token ceiling. |
require | string[] | Capabilities the model must support, e.g. ["json", "tools"]. |
fallbacks | string[] | Explicit models to try if the primary pick fails. |
Why route?
One API, many models
Routing decouples your app from any single provider. When a faster or cheaper model ships, the router adopts it automatically — no code change, no redeploy.
Streaming
Set "stream": true to receive server-sent events. The router commits to a
model before the first token, so resolved_model arrives in the opening chunk.