SpeechStack
Submit a template
← All templatestemplates / multi-agent-voice-pipeline-with-transcription-and-synthesis
Customupdated May 27, 2025 · other · transcription-and-summary

Multi-Agent Voice Pipeline with Transcription and Synthesis

A three-agent crew that processes spoken audio end-to-end: one agent transcribes speech, a research analyst extracts key insights, and a speaker agent delivers the analysis as spoken audio. Demonstrates sequential multi-agent coordination with voice input and output.

No demo yetView sourceFork template
The numbers
latency
cost / min
frameworkCustom
The stack
telephonyWeb Only
speech-to-textDeepgram Nova-3
llmGPT-4o-mini
text-to-speechDeepgram Aura-2
System prompt
No prompt published.
Config
config.json
{
  "agents": [
    {
      "goal": "Transcribe audio using Deepgram STT",
      "role": "Voice Listener",
      "tool": "transcribe_audio"
    },
    {
      "goal": "Extract key insights from transcript",
      "role": "Research Analyst",
      "backend": "GPT-4.1-mini"
    },
    {
      "goal": "Synthesize summary into spoken audio",
      "role": "Voice Speaker",
      "tool": "speak_text"
    }
  ],
  "process": "sequential",
  "stt_model": "nova-3",
  "tts_model": "aura-2-asteria-en",
  "smart_format": true
}
Tags
crewaimulti-agentsequential-processingvoice-to-voice
Voice Notes

Voice AI recipes, picks, and analysis.

Get the useful new templates plus the occasional teardown of what’s working in production voice AI.

contributed by @lukeocodes · MIT · source: github discoverylanguages: en-US