We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Dmitriy Pastushenkov, Adrian Boguszewski - Cloud? No Thanks! I’m Gonna Run GenAI on My AI PC
Learn how to run GenAI models locally on Intel Core Ultra processors using OpenVINO. Discover efficient AI workload distribution across CPU, GPU & NPU without cloud dependencies.
-
Intel Core Ultra processors include 3 AI engines - CPU, integrated GPU, and NPU (Neural Processing Unit) optimized for different AI workloads
-
NPU enables low-power AI acceleration (around 20W vs 40W on CPU), extending battery life while still delivering good performance for continuous background AI tasks
-
OpenVINO toolkit provides easy deployment of AI models across Intel hardware (CPU/GPU/NPU) with minimal code changes needed
-
The new OpenVINO gen.ai library allows running local LLMs with just 3 lines of code and smaller dependencies compared to HuggingFace pipelines
-
Different AI workloads can be optimally distributed across the processors:
- CPU: Fast response, low latency tasks
- GPU: High throughput tasks like chatbots
- NPU: Background tasks requiring power efficiency
-
Models can be quantized and compressed (e.g. from 25GB to 5GB) while maintaining accuracy for local execution
-
OpenVINO supports multiple frameworks including PyTorch, ONNX, TensorFlow and integrates with popular tools like HuggingFace and LangChain
-
RAG (Retrieval Augmented Generation) applications are supported through LangChain integration
-
All inference runs locally without cloud dependencies - no data needs to leave the device
-
Next-gen Lunar Lake processors will feature NPUs with 45 TOPS performance (4x current gen)