closed beta · 247 builders ahead

Voice AI @ 18¢ < 1¢ a minute.

No magic. Just the engineering
most voice AI skipped.

— glimpass

Voice AI that gets cheaper every call. Drop-in for every major voice agent API — same code, one tenth the bill.

# inside claude code
/plugin marketplace add glimpass/voice
/plugin install voice@glimpass

// ~/.cursor/mcp.json
{
  "mcpServers": {
    "glimpass": {
      "command": "npx",
      "args": ["-y", "@glimpass/mcp"]
    }
  }
}

# ~/.codex/config.toml
[mcp_servers.glimpass]
command = "npx"
args = ["-y", "@glimpass/mcp"]

* coming soon — join the waitlist for first-access install tokens.

$0.005 /min @ 10k 92ms p99 cached turn 71% hit-rate, 24h

— the math —

Most voice AI charges you the same on call 1 and call 100,000.

We charge less every time the same conversation happens.

baseline: openai gpt-4o · elevenlabs turbo · deepgram nova-2 · platform-managed ours: measured floor across 4 production agents above 10k calls

— receipts, not promises —

$432,118

saved across our customers · live · last update +$0.41

— internals —

Built like a database, not a model.

// how a turn flows user → cache → bot live in production

/ 01

A tree per agent.

Every agent gets its own tree of conversation paths. After each turn we know exactly which node we're at. The next user input becomes a constant-time lookup against everything ever said at that exact spot.

O(1)lookup

/ 02

A hand-written match function. not vibe-coded

User input is reduced to a canonical form before lookup — case-folded, filler-stripped, with intent and entity content promoted into the form itself. Three layered passes decide equivalence. No embeddings. No model. The determinism is the point.

deterministicno FP

/ 03

Audio, not inference.

On match, the response is the original TTS audio — same voice, same prosody, same timing — served straight from cache. The LLM never runs. The TTS never runs. The user hears the bot in under 100ms.

<100msturnaround

/ 04

Self-tuning, per agent.

Match thresholds tune themselves against measured false-positive and false-negative rates from each agent's actual traffic. Hit-rate monotonically improves with usage.

self-tunesper agent

// disclosure the canonicalizer, threshold-tuning loop, and audio-serving internals are not published. methodology doc shared at signup.

— note from the engineer —

I built this after watching my own voice-AI bill go to inference returning the same answers, in the same voice, to the same questions — over and over.

The cache underneath isn't a wrapper around someone else's library. It's an original algorithm, designed for voice agents from the first line — informed by systems I've shipped through 20+ million production calls.

glimpass

— founder's note —

— underneath —

And everything else — done right.

Smart Turn v3 detection <100ms interruption handling Silero VAD, full-duplex function calling + MCP tools every major US carrier · Twilio · Vonage · Telnyx · Plivo · SignalWire multi-lingual STT, code-switched voicemail / AMD detection natural prosody mid-sentence SOC-ready audit logs

The cache is the part you'll talk about.
The rest is the part you'll forget exists — because it just works.

— migration —

You don't migrate. You just change the URL.

Your existing voice agent code just works.

We expose the same APIs every major voice platform exposes — same routes, same field names, same SDKs. Point your client at our base URL, swap the API key, ship.

No new mental model. No JSON migration. No flow rebuilding. Your runbooks don't change.

v1 · 100% compat v2 · 100% compat v3 · in beta

~/sales-agent · zsh ● connected · us-central1

# 1. install your usual agent CLI

$ npm install -g voice-agent-cli

# 2. point it at us

$ export VOICE_BASE_URL=https://api.glimpass.com/v1

$ export VOICE_API_KEY=glimp_live_xxx

# 3. that's it. your code didn't change.

$ voice-agent assistant create \

--name "Outbound Sales" \

--voice rachel --model gpt-oss-120b

✓ asst_a1b2c3 created cost-floor: $0.005/min

— pricing & deployment —

Honest billing. You win, we win.

recommended

// hosted · share

20% of savings

billed in real time · every cache hit itemized

Bring your own LLM, TTS, and STT keys. We sit in front, measure exactly what we save you, and bill 20% of the delta — itemized per cache hit, deducted in real time. Save nothing? Pay nothing.

real-time deduction · powered by our observability layer
per-call audit trail · every dollar reconciled
methodology doc shared at signup
flat-rate quote available for procurement on request

on-prem

// on-prem · enterprise

90% GPU drop

self-hosted models · air-gapped · runs on your cluster

For teams running their own LLM and TTS. Glimpass sits in front of your inference cluster and intercepts cached turns before they ever hit a GPU. Same hardware budget, an order of magnitude more concurrent calls.

10× concurrent capacity on the same hardware
0 data leaves your network · air-gapped install
SOC-friendly audit logs, exportable on request
quote, deployment plan, methodology doc — within 48h

AI stack only. Telephony pass-through (~$0.008/min) billed at carrier cost.

— let's talk —

Tell us about your campaign.

We reply within 48 hours. No SDR call. No demo. We send you a quote and a methodology doc — you decide from there.

Thanks. We'll be in touch.