Menu
    Home
    voice ai

    more testing

    voice ai voice ai

    2 min read
    more testing

    Voice AI lives or dies on latency. Above 500ms, users start talking over the agent. Above 800ms, they hang up. Our north star has been to keep the round-trip from "user stops speaking" to "agent starts speaking" under 200ms, end-to-end, across STT, the LLM, and TTS. This post walks through how we got there.

    What "200ms" actually means

    The clock starts when our VAD detects end-of-utterance. It stops when the first audio packet of the agent's response leaves our edge. That window includes: VAD finalization, STT post-processing, LLM time-to-first-token, TTS first-chunk synthesis, and network egress to the carrier.

    Three things matter more than anything else: pipelining the stages, minimizing buffers between them, and never blocking on slow tails.

    Pipelining the stages

    Naive pipeline: STT → LLM → TTS, sequential. We never finalize STT before kicking off the LLM. Instead, we send partial transcripts to a "thinker" stage that drafts the likely response while the user is still finishing their sentence. By the time end-of-utterance fires, the LLM has usually already produced 30+ tokens of a candidate response, which we either keep or discard based on the final transcript.

    Buffers are the enemy

    Every TCP buffer between stages is latency you pay for nothing. We replaced the gRPC streams between STT and the orchestrator with shared-memory ring buffers, and between the orchestrator and TTS with a single Unix domain socket. That alone shaved 40ms off p50.

    The slow-tail problem

    Average latency is a lie. What you care about is the worst 5% of calls. Our biggest wins came from killing slow tails: aggressive timeouts on TTS, parallel speculative decoding on the LLM, and warm pools of pre-initialized GPU workers.

    Where we are now

    p50 sits at 180ms. p95 at 290ms. p99 at 450ms. That p99 is what we're working on next.

    Alok
    Written by

    Alok

    Comments

    Loading comments…

    Keep reading

    Try it

    Build your first voice AI agent

    Spin up a production-grade voice agent from a single prompt. Free to try, no credit card required.

    Build with the API

    Read the documentation

    Quickstart, API reference, SDKs, voice + telephony guides. Everything you need to ship voice AI in production.