Model Table

Contents

Model Table#

No

Model Name

Hybrid

NPU-only

1

Llama-2-7b-chat-hf

2

Llama-2-7b-hf

3

Meta-Llama-3-8B

4

Llama-3.1-8B

5

Meta-Llama-3.1-8B-Instruct

6

Llama-3.2-1B

7

Llama-3.2-1B-Instruct

8

Llama-3.2-3B

9

Llama-3.2-3B-Instruct

10

CodeLlama-7b-Instruct-hf

11

DeepSeek-R1-Distill-Llama-8B

12

DeepSeek-R1-Distill-Qwen-1.5B

13

Qwen-2.5-1.5B-Instruct

14

DeepSeek-R1-Distill-Qwen-7B

15

Phi-3-mini-4k-instruct

16

Phi-3-mini-128k-instruct

17

Phi-3.5-mini-instruct

18

Phi-4-mini-instruct

19

Phi-4-mini-reasoning

20

gemma-2-2b

21

Mistral-7B-Instruct-v0.1

22

Mistral-7B-Instruct-v0.2

23

Mistral-7B-Instruct-v0.3

24

Mistral-7B-v0.3

25

AMD-OLMo-1B-SFT-DPO

26

chatglm3-6b

27

Qwen1.5-7B-Chat

28

Qwen2-1.5B

29

Qwen2-7B

30

Qwen2.5-0.5B-Instruct

31

Qwen2.5-7B-Instruct

32

Qwen2.5-Coder-0.5B-Instruct

33

Qwen2.5-Coder-1.5B-Instruct

34

Qwen2.5-Coder-7B-Instruct

35

Qwen2.5-3B-Instruct

36

Qwen3-1.7B

37

Qwen3-4B

38

Qwen3-8B

Notes#

  1. All models are supported up to 4K context length, with the following exceptions:

  • AMD-OLMo-1B-SFT-DPO: inherently supports only 2K context length

  • gemma-2-2b: supports up to 3K context length