← All templatestemplates / streaming-voice-agent-with-gemini-flash-and-silero-barge-in

Customupdated May 17, 2025 · other · support

Streaming Voice Agent with Gemini Flash and Silero Barge-In

A native orchestration voice agent that uses Google Gemini 2.0 Flash for conversation, Deepgram Nova-2 for real-time speech recognition, Cartesia Sonic-2 for text-to-speech synthesis, and Silero VAD for client-side voice activity detection and barge-in support. Handles both inbound and outbound calls over Plivo telephony with asyncio-based concurrent audio pipelines.

No demo yetView source Fork template

The numbers

latency—

cost / min—

frameworkCustom

The stack

telephonyPlivo

speech-to-textDeepgram Nova-2

llmGemini 2.5 Flash

text-to-speechCartesia Sonic-2

System prompt

No prompt published.

Config

config.json

{
  "barge_in": true,
  "vad": "silero",
  "audio_format": "ulaw",
  "gemini_model": "gemini-2.0-flash",
  "cartesia_model": "sonic-2",
  "deepgram_model": "nova-2-phonecall",
  "sample_rate_hz": 8000,
  "cartesia_voice_id": "british-lady",
  "inbound_and_outbound": true
}

Voice AI recipes, picks, and analysis.

Get the useful new templates plus the occasional teardown of what’s working in production voice AI.