Whispered Secrets: Building An Open-Source Tool To Live Transcribe & Summarize Conversations

Python

Learn how to build a privacy-focused, open-source tool for real-time conversation transcription & summarization using Python, Whisper, and Streamlit - works offline!

Key takeaways

Built an open-source transcription pipeline using Python, Whisper, and Streamlit to create a local alternative to services like Fireflies.AI
System components include:
- Speech Recognition library for audio input
- Whisper for transcription
- Ollama for local LLM summarization
- Streamlit for the user interface
- Thread-safe queuing system for audio processing
Key advantages of local implementation:
- Works offline
- Can handle sensitive/confidential information
- Customizable for specific needs
- No data sent to external servers
Technical considerations:
- Tiny.en Whisper model requires ~30-70MB, runs on modest hardware
- Medium model needs ~4GB
- Thread management and safety required
- Streamlit state management needs careful handling
Functionality includes:
- Live microphone input
- Real-time transcription
- Automatic summarization
- Speaker detection capabilities
- Configurable energy thresholds
- Customizable model selection
Challenges faced:
- Managing thread safety
- Handling Streamlit’s reload cycle
- Real-time processing issues
- Session state persistence
- Audio chunking and timing
Code is open source and built following modern Python practices:
- Uses Poetry for dependency management
- Includes CI pipeline
- Follows data cookie cutter template
- Available on GitHub

Whispered Secrets: Building An Open-Source Tool To Live Transcribe & Summarize Conversations

More talks