SpeechStack
Submit a template
← All templatestemplates / general-purpose-voice-agent-with-open-source-nvidia-models
Pipecatupdated Jun 1, 2025 · other · other

General Purpose Voice Agent with Open Source NVIDIA Models

A low-latency voice agent implementation using NVIDIA's open source models (Nemotron Speech ASR, Nemotron-3 Nano LLM, and Magpie TTS). Features adaptive TTS streaming, buffered LLM with 100% KV cache reuse, and multiple deployment options including local GPU execution on DGX Spark/RTX 5090 or cloud deployment via Modal and Pipecat Cloud.

Try the demoView sourceFork template
The numbers
latencyOptimized for voice-to-voice latency with buffered LLM and adaptive TTS
cost / minSelf-hosted on NVIDIA GPU hardware (DGX Spark or RTX 5090)
frameworkPipecat
The stack
telephonyWeb Only
speech-to-textDeepgram Nova-3
llmLlama 3.3 70B
text-to-speechCartesia Sonic-3
System prompt
No prompt published.
Config
config.json
{
  "llm_mode": "llamacpp-q8",
  "vad_type": "SmartTurn",
  "adaptive_tts": true,
  "context_size": 16384,
  "kv_cache_reuse": "100%",
  "parallel_slots": 1,
  "recording_enabled": false,
  "transport_options": [
    "webrtc",
    "daily",
    "twilio"
  ],
  "deployment_targets": [
    "local-dgx-spark",
    "local-rtx-5090",
    "modal-cloud",
    "pipecat-cloud"
  ],
  "single_slot_operation": true
}
Tags
nvidiaopen-sourcelow-latencyself-hostedgpudgx-sparkrtx-5090websocketadaptive-streamingkv-cache
Voice Notes

Voice AI recipes, picks, and analysis.

Get the useful new templates plus the occasional teardown of what’s working in production voice AI.

contributed by @kwindla · Proprietary · source: github discoverylanguages: en-US