Running LLM on Linux#
This page showcases an example of running LLM on RyzenAI NPU
Open a Linux terminal and create a new folder
mkdir run_llm
cd run_llm
Choose any prequantized and postprocessed ready-to-run Model from Hugging Face collection of NPU models
Models with 4K Context length Models with 16K Context length
For this flow, “Phi-3.5-mini-instruct_rai_1.7.1_npu_4K” is chosen for reference
# Make sure git-lfs is installed (https://git-lfs.com)
sudo apt install git-lfs
git lfs install
git clone https://huggingface.co/amd/Phi-3.5-mini-instruct_rai_1.7.1_npu_4K
Search for RYZEN_AI_INSTALLATION_PATH
# Activate the virtual environment created in Linux Installation step
source <TARGET-PATH>/venv/bin/activate
echo $RYZEN_AI_INSTALLATION_PATH
Collecting the necessary files to get in current working directory
- Deployment folder - This has necessary libraries to run LLM Model
# Navigate to <TARGET-PATH>/venv and copy the "deployment" folder
cp -r <TARGET-PATH>/venv/deployment .
- Model Benchmark Script
# Navigate to <TARGET-PATH>/venv/LLM/examples/ and copy "model_benchmark" file.
cp <TARGET-PATH>/venv/LLM/examples/model_benchmark .
- Prompt file - Input to your LLM Model
# Navigate to <TARGET-PATH>/venv/LLM/examples/ and copy "amd_genai_prompt.txt" file.
cp <TARGET-PATH>/venv/LLM/examples/amd_genai_prompt.txt .
Current working directory should have below files
amd_genai_prompt.txt deployment model_benchmark Phi-3.5-mini-instruct_rai_1.7.1_npu_4K
Create a new file for XRT Drivers named “xrt.ini”
- vi xrt.ini (Creates a new file)
- Add below lines to the file and save it
[Debug]
num_heap_pages = 8
- Set XRT_INI_PATH to point to this file
export XRT_INI_PATH=$PWD/xrt.ini
Lastly, set required library path
export LD_LIBRARY_PATH=/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=deployment/lib:$LD_LIBRARY_PATH
export RYZENAI_EP_PATH=$PWD/deployment/lib/libonnxruntime_providers_ryzenai.so
We can now run our Model with command below:
./model_benchmark -i Phi-3.5-mini-instruct_rai_1.7.1_npu_4K/ -l 128 -f amd_genai_prompt.txt
# Enable "-v" flag for verbose output
Expected output#
-----------------------------
Prompt Number of Tokens: 128
Batch size: 1, prompt tokens: 128, tokens to generate: 128
Prompt processing (time to first token):
avg (us): 148056
avg (tokens/s): 864.536
p50 (us): 148143
stddev (us): 375.335
n: 5 * 128 token(s)
Token generation:
avg (us): 56874.3
avg (tokens/s): 17.5826
p50 (us): 56250.6
stddev (us): 6743.11
n: 635 * 1 token(s)
Token sampling:
avg (us): 27.273
avg (tokens/s): 36666.3
p50 (us): 27.21
stddev (us): 0.202461
n: 5 * 1 token(s)
E2E generation (entire generation loop):
avg (ms): 7371.29
p50 (ms): 7378.4
stddev (ms): 14.3836
n: 5
Peak working set size (bytes): 12168941568
Preparing OGA Model#
Install “model_generate” package in current virtual environment
pip install model-generate==1.7.1 --force-reinstall --no-deps --extra-index-url https://pypi.amd.com/ryzenai_llm/1.7.1/linux/simple/
Currently Linux supports NPU only flow. Read more on Model Generation by visiting Preparing OGA Models