Prototype and Deploy LLM Applications on Intel® NPUs

Logo
Presented by

Intel Tiber

About this talk

Model size plus limited hardware resources in client devices (for example, disk, RAM, or CPU) make it increasingly challenging to deploy large language models (LLM) on laptops compared to cloud-based solutions. The AI PC from Intel solves this issue by including a CPU, GPU, and NPU on one device. This session focuses on the NPU and showcases how to prototype and deploy LLM applications locally. It also includes: - How NPU architecture works, including features, advantages, and capabilities in accelerating neural network computations on Intel® Core™ Ultra processors (the backbone of AI PCs from Intel). - Practical aspects of deploying performant LLM apps on Intel NPUs—from initial setup to optimization and system partitioning—using the OpenVINO™ toolkit and its NPU plug-in. - What LLMs are, and advantages versus challenges of local inference. - Fast LLM prototyping on Intel Core Ultra processors using the Intel® NPU Acceleration Library. Get real-world examples and case studies (like chatbots and retrieval augmented generation [RAG]) that showcase the seamless integration of LLM applications with NPUs, including how this synergy can unlock performance and efficiency. Skill level: All
Related topics:

More from this channel

Upcoming talks (0)
On-demand talks (19)
Subscribers (454)
As we adapt to the rapidly growing demand for AI applications, significant challenges are bound to arise. Streamlining the process of machine learning,
complying with regulatory frameworks worldwide, ensuring security and privacy, and controlling cloud computing costs have become increasingly crucial
priorities. Your work is instrumental in overcoming these challenges. The tools you use matter, and the foundation you build on matters even more. With the
Intel® Tiber™ portfolio, we’re partnering with our customers to harness the power of AI and other cutting-edge technologies to move the world forward, e…