SpeechStack
Submit a template
← All templatestemplates / streaming-voice-agent-with-gemini-flash-and-silero-barge-in
Customupdated May 17, 2025 · other · support

Streaming Voice Agent with Gemini Flash and Silero Barge-In

A native orchestration voice agent that uses Google Gemini 2.0 Flash for conversation, Deepgram Nova-2 for real-time speech recognition, Cartesia Sonic-2 for text-to-speech synthesis, and Silero VAD for client-side voice activity detection and barge-in support. Handles both inbound and outbound calls over Plivo telephony with asyncio-based concurrent audio pipelines.

No demo yetView sourceFork template
The numbers
latency
cost / min
frameworkCustom
The stack
telephonyPlivo
speech-to-textDeepgram Nova-2
llmGemini 2.5 Flash
text-to-speechCartesia Sonic-2
System prompt
No prompt published.
Config
config.json
{
  "barge_in": true,
  "vad": "silero",
  "audio_format": "ulaw",
  "gemini_model": "gemini-2.0-flash",
  "cartesia_model": "sonic-2",
  "deepgram_model": "nova-2-phonecall",
  "sample_rate_hz": 8000,
  "cartesia_voice_id": "british-lady",
  "inbound_and_outbound": true
}
Tags
native-orchestrationasynciovadbarge-intelephonyinboundoutbound
Voice Notes

Voice AI recipes, picks, and analysis.

Get the useful new templates plus the occasional teardown of what’s working in production voice AI.

contributed by @plivo · Proprietary · source: github discoverylanguages: en-US