← All templatestemplates / general-purpose-voice-agent-with-open-source-nvidia-models

Pipecatupdated Jun 1, 2025 · other · other

General Purpose Voice Agent with Open Source NVIDIA Models

A low-latency voice agent implementation using NVIDIA's open source models (Nemotron Speech ASR, Nemotron-3 Nano LLM, and Magpie TTS). Features adaptive TTS streaming, buffered LLM with 100% KV cache reuse, and multiple deployment options including local GPU execution on DGX Spark/RTX 5090 or cloud deployment via Modal and Pipecat Cloud.

Try the demo View source Fork template

The numbers

latencyOptimized for voice-to-voice latency with buffered LLM and adaptive TTS

cost / minSelf-hosted on NVIDIA GPU hardware (DGX Spark or RTX 5090)

frameworkPipecat

The stack

telephonyWeb Only

speech-to-textDeepgram Nova-3

llmLlama 3.3 70B

text-to-speechCartesia Sonic-3

System prompt

No prompt published.

Config

config.json

{
  "llm_mode": "llamacpp-q8",
  "vad_type": "SmartTurn",
  "adaptive_tts": true,
  "context_size": 16384,
  "kv_cache_reuse": "100%",
  "parallel_slots": 1,
  "recording_enabled": false,
  "transport_options": [
    "webrtc",
    "daily",
    "twilio"
  ],
  "deployment_targets": [
    "local-dgx-spark",
    "local-rtx-5090",
    "modal-cloud",
    "pipecat-cloud"
  ],
  "single_slot_operation": true
}

Voice AI recipes, picks, and analysis.

Get the useful new templates plus the occasional teardown of what’s working in production voice AI.