Conversational AI Agent with Streaming STT and TTS

A technical demonstration of a conversational AI agent that combines streaming speech-to-text and text-to-speech capabilities. The agent engages users in natural language interactions using Deepgram's Agent API, showcasing real-time voice conversation with low latency and natural responses. Built with Next.js and React, it demonstrates how to build virtual assistants, educational tutors, and customer service applications.

Try the demo View source Fork template

The numbers

latency—

cost / min—

frameworkCustom

The stack

telephonyWeb Only

speech-to-textDeepgram Nova-3

llmGPT-4o

text-to-speechDeepgram Aura-2

System prompt

No prompt published.

Config

config.json

{
  "voice_id": "aura-thalia-en",
  "barge_in": true,
  "vad": "silero-vad",
  "websocket_url": "wss://agent.deepgram.com/v1/agent/converse",
  "authentication": "JWT"
}

Voice AI recipes, picks, and analysis.

Get the useful new templates plus the occasional teardown of what’s working in production voice AI.