← All templatestemplates / multi-agent-voice-pipeline-with-transcription-and-synthesis

Customupdated May 27, 2025 · other · transcription-and-summary

Multi-Agent Voice Pipeline with Transcription and Synthesis

A three-agent crew that processes spoken audio end-to-end: one agent transcribes speech, a research analyst extracts key insights, and a speaker agent delivers the analysis as spoken audio. Demonstrates sequential multi-agent coordination with voice input and output.

No demo yetView source Fork template

The numbers

latency—

cost / min—

frameworkCustom

The stack

telephonyWeb Only

speech-to-textDeepgram Nova-3

llmGPT-4o-mini

text-to-speechDeepgram Aura-2

System prompt

No prompt published.

Config

config.json

{
  "agents": [
    {
      "goal": "Transcribe audio using Deepgram STT",
      "role": "Voice Listener",
      "tool": "transcribe_audio"
    },
    {
      "goal": "Extract key insights from transcript",
      "role": "Research Analyst",
      "backend": "GPT-4.1-mini"
    },
    {
      "goal": "Synthesize summary into spoken audio",
      "role": "Voice Speaker",
      "tool": "speak_text"
    }
  ],
  "process": "sequential",
  "stt_model": "nova-3",
  "tts_model": "aura-2-asteria-en",
  "smart_format": true
}

Voice AI recipes, picks, and analysis.

Get the useful new templates plus the occasional teardown of what’s working in production voice AI.