Exo: Run your own AI cluster at home by Mohamed Baioumy

Learn how Exo lets you run AI models on personal devices by combining their computing power, enabling private, low-latency inference without expensive GPUs or cloud services.

Key takeaways

Exo is an open-source library that enables running AI clusters on everyday devices like phones, laptops, and watches by aggregating their computing power
Key benefits of running models locally include:
- Enhanced privacy by keeping data on personal devices
- Lower latency compared to cloud solutions
- Ability to run large models without expensive GPUs
- Linear scaling with additional devices (up to a point)
Two main scenarios for running models:
- When model fits in device memory - straightforward execution
- When model is too large - requires sequential loading of layers and memory management
Performance considerations:
- Adding devices improves throughput but not necessarily individual request latency
- Network connection type impacts speed (Thunderbolt faster than Wi-Fi)
- Device capabilities matter - phones provide ~1/4 computing power of laptops
- Embedding transfers between devices are relatively small (16KB)
Technical implementation details:
- Models can be partitioned across devices based on available memory
- 4-bit quantization reduces model size (e.g., 405B parameter model becomes ~4GB)
- Uses TinyGrad backend for hardware support
- Simple installation via Python and shell script
- Supports various AI accelerators and GPUs
Challenges of open-source AI vs traditional open-source software:
- Requires significant upfront capital
- Hardware limitations for consumer devices
- Complex memory management for large models

Exo: Run your own AI cluster at home by Mohamed Baioumy

More talks