Running LLMs#

Note

Support for LLMs is currently in the Early Access stage. Early Access features are features which are still undergoing some optimization and fine-tuning. These features are not in their final form and may change as we continue to work in order to mature them into full-fledged features.

OGA-based Flow#

Starting with version 1.3, the Ryzen AI Software supports deploying quantized LLMs on Ryzen AI PCs using the ONNX Runtime generate() API (OGA).

Two different execution modes are supported:

Hybrid mode: leverages both the NPU and GPU
- Detailed documentation: OGA Hybrid Execution Mode
NPU-only mode: leverages only the NPU.
- Detailed documentation: OGA NPU Execution Mode

The required libraries and the supporting documentation are available upon request on the AMD secure download site: https://account.amd.com/en/member/ryzenai-sw-ea.html

The OGA-based flow currently supports the following models:

deepseek-ai/DeepSeek-R1-Distill-Llama-8B (Hybrid mode only)
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B (Hybrid mode only)
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B (Hybrid mode only)
microsoft/Phi-3-mini-4k-instruct
microsoft/Phi-3.5-mini-instruct
mistralai/Mistral-7B-Instruct-v0.3
meta-llama/Llama-2-7b-hf
meta-llama/Llama-2-7b-chat-hf
meta-llama/Meta-Llama-3-8B
meta-llama/Llama-3.1-8B
meta-llama/Llama-3.2-1B-Instruct
meta-llama/Llama-3.2-3B-Instruct
Qwen/Qwen1.5-7B-Chat
THUDM/chatglm3-6b

Pre-optimized models are available on Hugging Face:

General models for hybrid execution: https://huggingface.co/collections/amd/quark-awq-g128-int4-asym-fp16-onnx-hybrid-674b307d2ffa21dd68fa41d5
DeepSeek-R1-Distill models for hybrid execution: https://huggingface.co/collections/amd/amd-ryzenai-deepseek-r1-distill-hybrid-67a53471e9d5f14bece775d2
General models for NPU-only executions: https://huggingface.co/collections/amd/quark-awq-g128-int4-asym-bf16-onnx-npu-13-6759f510b8132db53e044aaf

The OGA-based flow is supported on STX platforms (and onwards)

PyTorch-based Flow#

In addition to the OGA flow, a flow based on PyTorch is available here: amd/RyzenAI-SW

This flow provides functional support for a broad set of LLMs. It is intended for prototyping and experimental purposes only. It is not optimized for performance and it should not be used for benchmarking.

The Pytorch-based flow is supported on PHX, HPT and STX platforms.

Running LLMs

Contents

Running LLMs#

OGA-based Flow#

PyTorch-based Flow#