Whispered Secrets: Building An Open-Source Tool To Live Transcribe & Summarize Conversations

Learn how to build a privacy-focused, open-source tool for real-time conversation transcription & summarization using Python, Whisper, and Streamlit - works offline!

Key takeaways
  • Built an open-source transcription pipeline using Python, Whisper, and Streamlit to create a local alternative to services like Fireflies.AI

  • System components include:

    • Speech Recognition library for audio input
    • Whisper for transcription
    • Ollama for local LLM summarization
    • Streamlit for the user interface
    • Thread-safe queuing system for audio processing
  • Key advantages of local implementation:

    • Works offline
    • Can handle sensitive/confidential information
    • Customizable for specific needs
    • No data sent to external servers
  • Technical considerations:

    • Tiny.en Whisper model requires ~30-70MB, runs on modest hardware
    • Medium model needs ~4GB
    • Thread management and safety required
    • Streamlit state management needs careful handling
  • Functionality includes:

    • Live microphone input
    • Real-time transcription
    • Automatic summarization
    • Speaker detection capabilities
    • Configurable energy thresholds
    • Customizable model selection
  • Challenges faced:

    • Managing thread safety
    • Handling Streamlit’s reload cycle
    • Real-time processing issues
    • Session state persistence
    • Audio chunking and timing
  • Code is open source and built following modern Python practices:

    • Uses Poetry for dependency management
    • Includes CI pipeline
    • Follows data cookie cutter template
    • Available on GitHub