NPU Management Interface#
Introduction#
The xrt-smi
utility is a command-line interface to monitor and manage the NPU integrated AMD CPUs.
It is installed in C:\Windows\System32\AMD
and it can be directly invoked from within the conda environment created by the Ryzen AI Software installer.
The xrt-smi
utility currently supports three primary commands:
examine
- generates reports related to the state of the AI PC and the NPU.validate
- executes sanity tests on the NPU.configure
- manages the performance level of the NPU.
By default, the output of the xrt-smi examine
and xrt-smi validate
commands goes to the terminal. It can also be written to file in JSON format as shown below:
xrt-smi examine -f JSON -o <path/to/output.json>
The utility also support the following options which can be used with any command:
--help
- help to use xrt-smi or one of its sub commands--version
- report the version of XRT, driver and firmware--verbose
- turn on verbosity--batch
- enable batch mode (disables escape characters)--force
- when possible, force an operation. Eg - overwrite a file in examine or validate
The xrt-smi
utility requires Microsoft Visual C++ Redistributable (version 2015 to 2022) to be installed.
Overview of Key Commands#
Command |
Description |
---|---|
examine |
system config, device name |
examine –report platform |
performance mode, power |
examine –report aie-partitions |
hw contexts |
examine –report telemetry |
opcode trace (Strix NPU) or stream buffer tokens (Phoenix NPU) |
validate –run latency |
latency test |
validate –run throughput |
throughput test |
validate –run gemm |
INT8 GEMM test TOPS. This is a full array test and it should not be run while another workload is running. NOTE: This command is not supported on PHX and HPT NPUs. |
validate –run aie-reconfig-overhead |
NPU array reconfiguration overhead |
validate –run quick |
runs a small test suite |
configure –pmode <mode> |
set performance mode |
📝 NOTE: The examine --report aie-partition
and examine --report telemetry
commands report runtime information. These commands should be used when a model is running on the NPU. You can run these commands in a loop to see live updates of the reported data.
xrt-smi examine#
System Information#
Reports OS/system information of the AI PC and confirm the presence of the AMD NPU.
xrt-smi examine
Sample Command Line Output:
System Configuration
OS Name : Windows NT
Release : 26100
Machine : x86_64
CPU Cores : 16
Memory : 64879 MB
Distribution : Microsoft Windows 11 Pro
Model : KoratPlus-KRK
BIOS Vendor : AMD
BIOS Version : WXC4A30N
XRT
Version : 2.19.0
Branch : HEAD
Hash : 090e3faccd90abd21e59a4edbf7ed9d9c1016d0b
Hash Date : 2024-11-08 19:54:53
NPU Driver Version : 32.0.203.237
NPU Firmware Version : 1.0.0.63
Device(s) Present
|BDF |Name |
|----------------|------|
|[00c5:00:01.1] |NPU |
Sample JSON Output:
{
"schema_version": {
"schema": "JSON",
"creation_date": "Tue Nov 5 16:34:17 2024 GMT"
},
"system": {
"host": {
"os": {
"sysname": "Windows NT",
"release": "26100",
"machine": "x86_64",
"distribution": "Microsoft Windows 11 Pro",
"model": "BIRMANPLUS",
"hostname": "XSJSTRIX23",
"memory_bytes": "0x7d6e0a000",
"cores": "24",
"bios_vendor": "AMD",
"bios_version": "TXB1001C"
},
"xrt": {
"version": "2.18.0",
"branch": "HEAD",
"hash": "090e3faccd90abd21e59a4edbf7ed9d9c1016d0b",
"build_date": "2024-10-30 19:53:17",
"drivers": [
{
"name": "NPU Driver",
"version": "32.0.203.237"
}
]
},
"devices": [
{
"bdf": "00c5:00:01.1",
"device_class": "Ryzen",
"name": "NPU",
"id": "0x0",
"firmware_version": "1.0.0.61",
"instance": "mgmt(inst=1)",
"is_ready": "true"
}
]
}
}
}
Platform Information#
Reports more detailed information about the NPU, such as the performance mode and power consumption.
xrt-smi examine --report platform
Sample Command Line Output:
---------------------
[00c5:00:01.1] : NPU
---------------------
Platform
Name : NPU
Performance Mode : Default
Power : 1.277 Watts
📝 NOTE: Power reporting is not supported on PHX and HPT NPUs. Power reporting is only available on STX devices and onwards.
NPU Partitions#
Reports details about the NPU partition and column occupancy on the NPU.
xrt-smi examine --report aie-partitions
Sample Command Line Output:
---------------------
[00c5:00:01.1] : NPU
---------------------
AIE Partitions
Total Column Utilization: 50%
Partition Index: 0
Columns: [0, 1, 2, 3]
HW Contexts:
|PID |Ctx ID |Status |Instr BO |Sub |Compl |Migr |Err |Prio |GOPS |EGOPS |FPS |Latency |
|-------|--------|--------|----------|-----|-------|------|-----|--------|------|-------|-----|---------|
|20696 |0 |Active |64 KB |57 |56 |0 |0 |Normal |0 |0 |0 |0 |
NPU Context Bindings#
Reports details about the columns to NPU HW context binding.
xrt-smi examine --report aie-partitions --verbose
Sample Command Line Output:
Verbose: Enabling Verbosity
Verbose: SubCommand: examine
---------------------
[00c5:00:01.1] : NPU
---------------------
AIE Partitions
Total Column Utilization: 50%
Partition Index: 0
Columns: [0, 1, 2, 3]
HW Contexts:
|PID |Ctx ID |Status |Instr BO |Sub |Compl |Migr |Err |Prio |GOPS |EGOPS |FPS |Latency |
|-------|--------|--------|----------|-----|-------|------|-----|--------|------|-------|-----|---------|
|20696 |0 |Active |64 KB |57 |56 |0 |0 |Normal |0 |0 |0 |0 |
AIE Columns
|Column ||HW Context Slot |
|--------||-----------------|
|0 ||[1] |
|1 ||[1] |
|2 ||[1] |
|3 ||[1] |
Telemetry#
Reports details about the ctrlcode opcode trace (on Strix NPU) or stream buffer tokens (on Phoenix NPU).
xrt-smi examine -r telemetry
Sample Command Line Output on a STX NPU:
---------------------
[00c5:00:01.1] : NPU
---------------------
Telemetry
|Mailbox Opcode |Count |
|----------------|-------|
|0 |264 |
|1 |4 |
|2 |266 |
|3 |266 |
|4 |266 |
|5 |2 |
|6 |262 |
|7 |264 |
|8 |3 |
|9 |266 |
|10 |266 |
|11 |266 |
|12 |266 |
|13 |257 |
|14 |257 |
|15 |0 |
|16 |0 |
|17 |0 |
|18 |0 |
|19 |0 |
|20 |0 |
|21 |0 |
|22 |0 |
|23 |0 |
|24 |0 |
|25 |0 |
|26 |0 |
|27 |0 |
|28 |0 |
|29 |0 |
Sample Command Line Output on a PHX NPU:
---------------------
[00c3:00:01.1] : NPU
---------------------
Telemetry
|Stream Buffer |Tokens |
|---------------|--------|
|0 |194 |
|1 |194 |
|2 |194 |
|3 |194 |
xrt-smi validate#
Executing a Sanity Check on the NPU#
Runs a set of built-in NPU sanity tests which includes verify, df-bw, tct and gemm.
Note: All tests are run in performance mode.
latency
- this test executes a no-op control code and measures the end-to-end latency on all columnsthroughput
- this test loops back the input data from DDR through a MM2S Shim DMA channel back to DDR through a S2MM Shim DMA channel. The data movement within the AIE array follows the lowest latency path i.e. movement is restricted to just the Shim tile.cmd-chain-latency
- this runs the same latency test as above but uses command chaining APIscmd-chain-throughput
- this runs the same throughout test as above but uses command chaining APIstct-one-col
- This test runs a control code with one time DMA bd config and 10K transfer iterations with token_check on one column. It then measures the average time that NPU takes to receive TCT (Task Completion Token)tct-all-col
- This test runs a similar DPU sequence as tct-one-col on multiple concurrent contexts on different columns, It measures the of concurrent TCT processing.gemm
- An INT8 GeMM kernel is deployed on all 32 cores by the application. Each core is storing cycle count in the core data memory. The cycle count is read by the firmware. The TOPS application uses the “XBUTIL” tool to capture the IPUHCLK while the workload runs. Once all cores are executed, the cycle count from all cores will be synced back to the host. Finally, the application uses IPUHCLK, core cycle count, and GeMM kernel size to calculate the TOPS. This is a full array test and it should not be run while another workload is running. NOTE: This command is not supported on PHX and HPT NPUs.aie-reconfig-overhead
- This test runs a DPU sequence which configures the NPU array. This test calculates the overhead for this reconfiguration.spatial-sharing-overhead
- This test calculates the time overhead it takes for context switching when a workload runs through spatial-sharing of AIE columns. This is the case, where a workload has exclusive ownership of the partition. No other workload can execute on that partition.temporal-sharing-overhead
- This test calculates the time overhead it takes for context switching when a workload runs through temporal sharing of AIE columns. This is the case, where multiple workloads share the same partition. When a workload is executing on the partition, no other workload sharing the same partition can be scheduled.
xrt-smi validate --run all
📝 NOTE: Some sanity checks may fail if other applications (for example MEP, Microsoft Experience Package) are also using the NPU.
Sample Command Line Output:
Validate Device : [00c5:00:01.1]
Platform : NPU
Power Mode : Performance
-------------------------------------------------------------------------------
Test 1 [00c5:00:01.1] : latency
Details : Average latency: 79.8 us
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 2 [00c5:00:01.1] : throughput
Details : Average throughput: 62169.1 ops
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 3 [00c5:00:01.1] : cmd-chain-latency
Details : Average latency: 18.7 us
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 4 [00c5:00:01.1] : cmd-chain-throughput
Details : Average throughput: 50869.4 ops
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 5 [00c5:00:01.1] : df-bw
Details : Average bandwidth per shim DMA: 47.1 GB/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 6 [00c5:00:01.1] : tct-one-col
Details : Average TCT throughput: 374708.7 TCT/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 7 [00c5:00:01.1] : tct-all-col
Details : Average TCT throughput: 749150.7 TCT/s
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 8 [00c5:00:01.1] : gemm
Details : Kernel name is 'DPU_1x4'
TOPS: 51.3
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 9 [00c5:00:01.1] : aie-reconfig-overhead
Details : Array reconfiguration overhead: 3.8 ms
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 10 [00c5:00:01.1] : spatial-sharing-overhead
Details : Overhead: 1543.8 ms
Test Status : [PASSED]
-------------------------------------------------------------------------------
Test 11 [00c5:00:01.1] : temporal-sharing-overhead
Details : Overhead: '4614.7' ms
Test Status : [PASSED]
-------------------------------------------------------------------------------
Validation completed. Please run the command '--verbose' option for more details
xrt-smi configure#
Managing the Performance Level of the NPU#
To set the performance level of the NPU, you can choose from the following modes: powersaver, balanced, performance, or default. Use the command below:
xrt-smi configure --pmode <default | powersaver | balanced | performance | turbo>
default
- adapts to the Windows Power Mode setting, which can be adjusted under System -> Power & battery -> Power mode. For finer control of the NPU settings, it is recommended to use the xrt-smi mode setting, which overrides the Windows Power mode and ensures optimal results.powersaver
- configures the NPU to prioritize power saving, preserving laptop battery life.balanced
- configures the NPU to provide a compromise between power saving and performance.performance
- configures the NPU to prioritize performance, consuming more power.turbo
- configures the NPU for maximum performance performance, requires AC power to be plugged in otherwise usesperformance
mode.
Example: Setting the NPU to high-performance mode
xrt-smi configure --pmode performance
To check the current performance level, use the following command:
xrt-smi examine --report platform