Compiling Operators for OGA Models#
Ryzen AI currently supports many popular LLMs in both hybrid and NPU-only flows. For these models, the required operators are already compiled and included in the Ryzen AI runtime. Such models can be run directly on Ryzen AI without any additional preparation.
When users fine-tune these models, only the weights change and no new operator shapes are introduced. In that case, follow the steps from Preparing OGA Models to prepare the model, which will run on the Ryzen AI runtime using the precompiled operators.
However, in cases where architectural changes introduce new operator shapes not available in the Ryzen AI runtime, additional operator compilation is required. This page provides a recipe to compile operators that are not already present in the runtime. This flow is experimental, and results may vary depending on the extent of the architectural changes.
Note
All OGA models are currently based on the ONNX Runtime GenAI Model Builder architecture. Therefore, this operator compilation flow requires the models are supported by ONNX Runtime GenAI.
Operator Compilation Flow (Hybrid Execution)#
Currently this flow is primarily supported for hybrid execution.
Ensure the model is quantized following the quantization recipe
Build the OGA DML model using the ONNX Runtime GenAI Model Builder included in the Ryzen AI software environment:
conda activate ryzen-ai-1.6.0
python -m onnxruntime_genai.models.builder \
-i <quantized model folder> -o <dml model folder> \
-p int4 -e dml
Compile the operators extracted from the OGA DML model:
onnx_utils vaiml --model-dir <dml model folder> --plugin_name <plugin name> --compile --ops_type bfp16
This generates a compiled operator package at: transaction-plugin\<plugin name>.zip
.
Generate the hybrid model:
Create a folder named dd_plugins
in the current working directory and place <plugin name>.zip
inside it. By default, the flow looks for the operator zip in dd_plugins
. To use a different location, see “Additional Details” below.
Generate the hybrid model:
model_generate --hybrid <output hybrid model folder> <dml model folder>
Run the hybrid model
Follow official guide to copy model_benchmark.exe
and required DLL dependencies to the current working directory. Then run:
.\model_benchmark.exe -i <hybrid_model_folder> -f amd_genai_prompt.txt -l "128, 256, 512, 1024, 2048" --verbose
Additional Details
Path to operator zip file
If <plugin name>.zip
is not placed in the dd_plugins
folder, set the DD_PLUGINS_ROOT
environment variable to point to its location:
set DD_PLUGINS_ROOT=C:\<path\to\folder\containing\<plugin name>.zip>
Enabling tracing
To enable tracing for debug purposes, set the DD_PLUGINS_TRACING
environment variable before generating the hybrid model:
# Optional: enable tracing
set DD_PLUGINS_TRACING=1
# Generate the model
model_generate --hybrid <output hybrid model folder> <dml model folder>