Model Table#
No |
Model Name |
Hybrid |
NPU-only |
---|---|---|---|
1 |
Llama-2-7b-chat-hf |
✓ |
✓ |
2 |
Llama-2-7b-hf |
✓ |
✓ |
3 |
Meta-Llama-3-8B |
✓ |
✓ |
4 |
Llama-3.1-8B |
✓ |
✓ |
5 |
Meta-Llama-3.1-8B-Instruct |
✓ |
✓ |
6 |
Llama-3.2-1B |
✓ |
✓ |
7 |
Llama-3.2-1B-Instruct |
✓ |
✓ |
8 |
Llama-3.2-3B |
✓ |
|
9 |
Llama-3.2-3B-Instruct |
✓ |
|
10 |
CodeLlama-7b-Instruct-hf |
✓ |
✓ |
11 |
DeepSeek-R1-Distill-Llama-8B |
✓ |
✓ |
12 |
DeepSeek-R1-Distill-Qwen-1.5B |
✓ |
✓ |
13 |
Qwen-2.5-1.5B-Instruct |
✓ |
✓ |
14 |
DeepSeek-R1-Distill-Qwen-7B |
✓ |
✓ |
15 |
Phi-3-mini-4k-instruct |
✓ |
✓ |
16 |
Phi-3-mini-128k-instruct |
✓ |
✓ |
17 |
Phi-3.5-mini-instruct |
✓ |
✓ |
18 |
Phi-4-mini-instruct |
✓ |
|
19 |
Phi-4-mini-reasoning |
✓ |
|
20 |
gemma-2-2b |
✓ |
|
21 |
Mistral-7B-Instruct-v0.1 |
✓ |
✓ |
22 |
Mistral-7B-Instruct-v0.2 |
✓ |
✓ |
23 |
Mistral-7B-Instruct-v0.3 |
✓ |
✓ |
24 |
Mistral-7B-v0.3 |
✓ |
✓ |
25 |
AMD-OLMo-1B-SFT-DPO |
✓ |
|
26 |
chatglm3-6b |
✓ |
✓ |
27 |
Qwen1.5-7B-Chat |
✓ |
✓ |
28 |
Qwen2-1.5B |
✓ |
✓ |
29 |
Qwen2-7B |
✓ |
✓ |
30 |
Qwen2.5-0.5B-Instruct |
✓ |
|
31 |
Qwen2.5-7B-Instruct |
✓ |
✓ |
32 |
Qwen2.5-Coder-0.5B-Instruct |
✓ |
|
33 |
Qwen2.5-Coder-1.5B-Instruct |
✓ |
✓ |
34 |
Qwen2.5-Coder-7B-Instruct |
✓ |
✓ |
35 |
Qwen2.5-3B-Instruct |
✓ |
✓ |
36 |
Qwen3-1.7B |
✓ |
|
37 |
Qwen3-4B |
✓ |
|
38 |
Qwen3-8B |
✓ |
Notes#
All models are supported up to 4K context length, with the following exceptions:
AMD-OLMo-1B-SFT-DPO: inherently supports only 2K context length
gemma-2-2b: supports up to 3K context length