Model Deployment#
ONNX Runtime with Vitis AI Execution Provider#
After the model is quantized, you can deploy it with ONNX Runtime by utilizing C++ or Python APIs using Vitis AI Execution Provider (VAI EP) for the inference session:
providers = ['VitisAIExecutionProvider']
session = ort.InferenceSession(model, sess_options = sess_opt,
providers = providers,
provider_options = provider_options)
Provider Options#
VAI EP supports three provider options:
Provider Options |
Type |
Default |
Description |
---|---|---|---|
config_file |
Mandatory |
None |
The configuration file |
cacheDir |
Optional |
|
The cache directory. |
cacheKey |
Optional |
{onnx_model_md5} |
Used to distinguish between the models. |
Environment Variables#
Additionally, use the following environment variables to control the Ryzen AI ONNX Runtime based deployment.
Environment Variable |
Type |
Default |
Description |
---|---|---|---|
XLNX_VART_FIRMWARE |
Mandatory |
None |
The IPU binary |
XLNX_ENABLE_CACHE |
Optional |
1 |
If unset, the runtime flow ignores the cache directory and recompiles the model. |
Python API Example#
import onnxruntime
# Add user imports
# ...
# Load inputs and perform preprocessing
# ...
# Create an inference session using the Vitis AI execution provider
session = onnxruntime.InferenceSession(
'[model_file].onnx',
providers=["VitisAIExecutionProvider"],
provider_options=[{"config_file":"/path/to/vaip_config.json"}])
input_shape = session.get_inputs()[0].shape
input_name = session.get_inputs()[0].name
# Load inputs and do preprocessing by input_shape
input_data = [...]
result = session.run([], {input_name: input_data})
C++ API Example#
// ...
#include <experimental_onnxruntime_cxx_api.h>
// include user header files
// ...
auto onnx_model_path = "resnet50_pt.onnx"
Ort::Env env(ORT_LOGGING_LEVEL_WARNING, "resnet50_pt");
auto session_options = Ort::SessionOptions();
auto options = std::unorderd_map<std::string,std::string>({});
options["config_file"] = "/path/to/vaip_config.json";
options["cacheDir"] = "/path/to/cache/directory";
options["cacheKey"] = "abcdefg"; // Replace abcdefg with your model name, eg. onnx_model_md5
// Create an inference session using the Vitis AI execution provider
session_options.AppendExecutionProvider("VitisAI", options);
auto session = Ort::Experimental::Session(env, model_name, session_options);
auto input_shapes = session.GetInputShapes();
// preprocess input data
// ...
// Create input tensors and populate input data
std::vector<Ort::Value> input_tensors;
input_tensors.push_back(Ort::Experimental::Value::CreateTensor<float>(
input_data.data(), input_data.size(), input_shapes[0]));
auto output_tensors = session.Run(session.GetInputNames(), input_tensors,
session.GetOutputNames());
// postprocess output data
// ...
Model Operators Assignment Report#
Vitis AI EP generates a file named vitisai_ep_report.json
that reports the model operator assignments across CPU and IPU. This report shows device statistics like total number of nodes, number of nodes running on the CPU, and DPU. It also shows a list of all operator types in the model, the list of operators running on the CPU, and on the DPU. The report also shows the node statistics like input to a node, the operation applied, the output from the node,
and etc.,
{
"deviceStat": [
{
"name": "all",
"nodeNum": 402,
"supportedOpType": [
"::Add",
...
]
},
{
"name": "CPU",
"nodeNum": 2,
"supportedOpType": [
"::DequantizeLinear",
...
]
},
{
"name": "DPU",
"nodeNum": 400,
"supportedOpType": [
"::Add",
...
]
}
],
...