Model size plus limited hardware resources in client devices (for example, disk, RAM, or CPU) make it increasingly challenging to deploy large language models (LLM) on laptops compared to cloud-based solutions. The AI PC from Intel solves this issue by including a CPU, GPU, and NPU on one device.
This session focuses on the NPU and showcases how to prototype and deploy LLM applications locally. It also includes:
- How NPU architecture works, including features, advantages, and capabilities in accelerating neural network computations on Intel® Core™ Ultra processors (the backbone of AI PCs from Intel).
- Practical aspects of deploying performant LLM apps on Intel NPUs—from initial setup to optimization and system partitioning—using the OpenVINO™ toolkit and its NPU plug-in.
- What LLMs are, and advantages versus challenges of local inference.
- Fast LLM prototyping on Intel Core Ultra processors using the Intel® NPU Acceleration Library.
Get real-world examples and case studies (like chatbots and retrieval augmented generation [RAG]) that showcase the seamless integration of LLM applications with NPUs, including how this synergy can unlock performance and efficiency.
Skill level: All