Dmitriy Pastushenkov, Adrian Boguszewski - Cloud? No Thanks! I’m Gonna Run GenAI on My AI PC

Learn how to run GenAI models locally on Intel Core Ultra processors using OpenVINO. Discover efficient AI workload distribution across CPU, GPU & NPU without cloud dependencies.

Key takeaways

Intel Core Ultra processors include 3 AI engines - CPU, integrated GPU, and NPU (Neural Processing Unit) optimized for different AI workloads
NPU enables low-power AI acceleration (around 20W vs 40W on CPU), extending battery life while still delivering good performance for continuous background AI tasks
OpenVINO toolkit provides easy deployment of AI models across Intel hardware (CPU/GPU/NPU) with minimal code changes needed
The new OpenVINO gen.ai library allows running local LLMs with just 3 lines of code and smaller dependencies compared to HuggingFace pipelines
Different AI workloads can be optimally distributed across the processors:
- CPU: Fast response, low latency tasks
- GPU: High throughput tasks like chatbots
- NPU: Background tasks requiring power efficiency
Models can be quantized and compressed (e.g. from 25GB to 5GB) while maintaining accuracy for local execution
OpenVINO supports multiple frameworks including PyTorch, ONNX, TensorFlow and integrates with popular tools like HuggingFace and LangChain
RAG (Retrieval Augmented Generation) applications are supported through LangChain integration
All inference runs locally without cloud dependencies - no data needs to leave the device
Next-gen Lunar Lake processors will feature NPUs with 45 TOPS performance (4x current gen)

Dmitriy Pastushenkov, Adrian Boguszewski - Cloud? No Thanks! I’m Gonna Run GenAI on My AI PC

More talks