Dmitriy Pastushenkov, Adrian Boguszewski - Cloud? No Thanks! I’m Gonna Run GenAI on My AI PC

Ai

Learn how to run GenAI models locally on Intel Core Ultra processors using OpenVINO. Discover efficient AI workload distribution across CPU, GPU & NPU without cloud dependencies.

Key takeaways
  • Intel Core Ultra processors include 3 AI engines - CPU, integrated GPU, and NPU (Neural Processing Unit) optimized for different AI workloads

  • NPU enables low-power AI acceleration (around 20W vs 40W on CPU), extending battery life while still delivering good performance for continuous background AI tasks

  • OpenVINO toolkit provides easy deployment of AI models across Intel hardware (CPU/GPU/NPU) with minimal code changes needed

  • The new OpenVINO gen.ai library allows running local LLMs with just 3 lines of code and smaller dependencies compared to HuggingFace pipelines

  • Different AI workloads can be optimally distributed across the processors:

    • CPU: Fast response, low latency tasks
    • GPU: High throughput tasks like chatbots
    • NPU: Background tasks requiring power efficiency
  • Models can be quantized and compressed (e.g. from 25GB to 5GB) while maintaining accuracy for local execution

  • OpenVINO supports multiple frameworks including PyTorch, ONNX, TensorFlow and integrates with popular tools like HuggingFace and LangChain

  • RAG (Retrieval Augmented Generation) applications are supported through LangChain integration

  • All inference runs locally without cloud dependencies - no data needs to leave the device

  • Next-gen Lunar Lake processors will feature NPUs with 45 TOPS performance (4x current gen)