M50模型性能数据
本文档展示了LLM和VLM模型的资源占用、推理性能指标,以及Prefill和Decode阶段的吞吐率趋势图,便于模型性能评估与对比分析。
测试环境
本次LLM/VLM模型性能测试在以下环境下进行:
软件平台环境:Ubuntu 24.04 LTS Docker 镜像,由后摩软件平台提供;
处理器:Intel Core i7-12700 64 GB;
AI 加速卡:后摩 LQ50 M.2 24GB;
注:所有测试均在此环境下进行,结果仅适用于相同或相近硬件配置。不同硬件或系统环境可能会影响性能表现。
LLM/VLM模型性能指标
下面的表格展示了各LLM/VLM模型在不同运行条件下的推理性能指标,包括吞吐率、视觉FPS和加载时间,帮助对比模型在相同条件下的性能表现。
表中个字段含义如下:
Model: 模型名称,如 Qwen3、Qwen3-VL。
Size: 模型参数量,如 8B, 30B-A3B。
PrefillLen(k): Prefill阶段处理的输入token数量。
Ctx(k): 上下文长度(单位千 tokens)。
Batch: 模型推理的 batch 大小。
NChip: 使用的M50芯片数量。
Input(k): 输入token数(单位千 tokens)。
Output(k): 输出token数(单位千 tokens)。
Prefill(tps): 表示Prefill阶段,处理输入prompt时的吞吐速度,即每秒可以处理的token数。
Decode(tps): 表示Decode阶段生成输出序列的吞吐速度,即每秒生成的 token 数。
Vision(fps): 表示视觉任务阶段处理帧率,仅适用于VLM模型,即每秒处理的帧数。
Model |
Size |
PrefillLen(k) |
Ctx(k) |
Batch |
NChip |
Input(k) |
Output(k) |
Prefill(tps) |
Decode(tps) |
Vision(fps) |
|---|---|---|---|---|---|---|---|---|---|---|
model |
size |
p_len(k) |
ctx_len(k) |
batch |
nchip |
input(k) |
output(k) |
prefill_speed(tps) |
decode_speed(tps) |
vision_speed(fps) |
CoPaw-Flash |
9B |
0.25 |
8 |
1 |
1 |
0.25 |
0.05 |
1650.66 |
20.64 |
0 |
CoPaw-Flash |
9B |
0.25 |
8 |
1 |
1 |
0.5 |
0.05 |
1649.58 |
20.61 |
0 |
CoPaw-Flash |
9B |
0.25 |
8 |
1 |
1 |
1 |
0.05 |
1639.31 |
15.92 |
0 |
CoPaw-Flash |
9B |
0.25 |
8 |
1 |
1 |
2 |
0.05 |
1631.89 |
20.18 |
0 |
CoPaw-Flash |
9B |
0.25 |
8 |
1 |
1 |
4 |
0.05 |
1616.38 |
20.24 |
0 |
CoPaw-Flash |
9B |
0.25 |
8 |
1 |
1 |
7.95 |
0.05 |
1569.39 |
20.09 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
0.25 |
0.05 |
2453.06 |
22.36 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
0.5 |
0.05 |
2437.41 |
22.02 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
1 |
0.05 |
2407.77 |
21.48 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
2 |
0.05 |
2240.06 |
21.02 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
4 |
0.05 |
1986.44 |
20.14 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
8 |
0.05 |
1630.19 |
18.6 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
16 |
0.05 |
1202.33 |
16.1 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
31.95 |
0.05 |
787.71 |
13.12 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
2 |
1 |
0.25 |
0.05 |
2451.89 |
38.01 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
2 |
1 |
0.5 |
0.05 |
2437.3 |
37.12 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
2 |
1 |
1 |
0.05 |
2403.58 |
35.87 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
2 |
1 |
2 |
0.05 |
2235.98 |
34.6 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
2 |
1 |
4 |
0.05 |
1985.43 |
32.28 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
2 |
1 |
8 |
0.05 |
1629.2 |
28.51 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
2 |
1 |
16 |
0.05 |
1202.36 |
23.09 |
0 |
DeepSeek-R1-Qwen3 |
8B |
0.25 |
32 |
2 |
1 |
31.95 |
0.05 |
787.55 |
17.48 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
0.25 |
0.05 |
1432.48 |
33.34 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
0.5 |
0.05 |
1441.67 |
33.15 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
1 |
0.05 |
1452.86 |
31.92 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
2 |
0.05 |
1419.66 |
30.68 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
4 |
0.05 |
1351.18 |
28.6 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
8 |
0.05 |
1219.18 |
25.2 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
16 |
0.05 |
1025.09 |
20.39 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
32 |
0.05 |
776.42 |
14.76 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
64 |
0.05 |
523.59 |
9.52 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
128 |
0.05 |
317.12 |
5.57 |
0 |
GPT-OSS |
20B |
0.25 |
256 |
1 |
1 |
255.95 |
0.05 |
176.67 |
3.05 |
0 |
GPT-OSS |
20B |
0.25 |
32 |
1 |
1 |
0.25 |
0.05 |
1439.58 |
33.42 |
0 |
GPT-OSS |
20B |
0.25 |
32 |
1 |
1 |
0.5 |
0.05 |
1461.51 |
33.07 |
0 |
GPT-OSS |
20B |
0.25 |
32 |
1 |
1 |
1 |
0.05 |
1463.01 |
31.93 |
0 |
GPT-OSS |
20B |
0.25 |
32 |
1 |
1 |
2 |
0.05 |
1427.05 |
30.64 |
0 |
GPT-OSS |
20B |
0.25 |
32 |
1 |
1 |
4 |
0.05 |
1357.86 |
28.59 |
0 |
GPT-OSS |
20B |
0.25 |
32 |
1 |
1 |
8 |
0.05 |
1221.69 |
25.23 |
0 |
GPT-OSS |
20B |
0.25 |
32 |
1 |
1 |
16 |
0.05 |
1027.97 |
20.39 |
0 |
GPT-OSS |
20B |
0.25 |
32 |
1 |
1 |
31.95 |
0.05 |
770.61 |
14.94 |
0 |
Qwen2.5-VL |
7B |
0.25 |
8 |
1 |
1 |
0.25 |
0.05 |
1169.48 |
22.35 |
5.23 |
Qwen2.5-VL |
7B |
0.25 |
8 |
1 |
1 |
0.5 |
0.05 |
1166.63 |
22.25 |
5.24 |
Qwen2.5-VL |
7B |
0.25 |
8 |
1 |
1 |
1 |
0.05 |
1166.53 |
21.92 |
5.24 |
Qwen2.5-VL |
7B |
0.25 |
8 |
1 |
1 |
2 |
0.05 |
1115.22 |
21.34 |
5.21 |
Qwen2.5-VL |
7B |
0.25 |
8 |
1 |
1 |
4 |
0.05 |
1023.32 |
19.19 |
5.24 |
Qwen2.5-VL |
7B |
0.25 |
8 |
1 |
1 |
7.95 |
0.05 |
868.83 |
17.48 |
5.24 |
Qwen2.5 |
7B |
0.25 |
8 |
1 |
1 |
0.25 |
0.05 |
1300.12 |
25.11 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
1 |
1 |
0.5 |
0.05 |
1301 |
25.02 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
1 |
1 |
1 |
0.05 |
1301.43 |
24.82 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
1 |
1 |
2 |
0.05 |
1260.61 |
24.52 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
1 |
1 |
4 |
0.05 |
1180.77 |
22.24 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
1 |
1 |
7.95 |
0.05 |
1034.31 |
20.64 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
2 |
1 |
0.25 |
0.05 |
1299.78 |
37.63 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
2 |
1 |
0.5 |
0.05 |
1300.75 |
37.53 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
2 |
1 |
1 |
0.05 |
1299.67 |
37.01 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
2 |
1 |
2 |
0.05 |
1260.31 |
36.33 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
2 |
1 |
4 |
0.05 |
1179.95 |
34.88 |
0 |
Qwen2.5 |
7B |
0.25 |
8 |
2 |
1 |
7.95 |
0.05 |
1033.79 |
32.29 |
0 |
Qwen3-VL |
30b_a3b |
0.25 |
8 |
1 |
1 |
0.25 |
0.05 |
535.53 |
28.54 |
8.23 |
Qwen3-VL |
30b_a3b |
0.25 |
8 |
1 |
1 |
0.5 |
0.05 |
534.36 |
28.44 |
8.25 |
Qwen3-VL |
30b_a3b |
0.25 |
8 |
1 |
1 |
1 |
0.05 |
537.87 |
27.94 |
8.21 |
Qwen3-VL |
30b_a3b |
0.25 |
8 |
1 |
1 |
2 |
0.05 |
524.82 |
25.76 |
8.21 |
Qwen3-VL |
30b_a3b |
0.25 |
8 |
1 |
1 |
4 |
0.05 |
488.06 |
22.14 |
8.23 |
Qwen3-VL |
30b_a3b |
0.25 |
8 |
1 |
1 |
7.95 |
0.05 |
434.96 |
19.59 |
8.24 |
Qwen3-VL |
4B |
0.25 |
32 |
1 |
1 |
0.25 |
0.05 |
3015.81 |
30.24 |
16.67 |
Qwen3-VL |
4B |
0.25 |
32 |
1 |
1 |
0.5 |
0.05 |
2986.93 |
29.67 |
16.7 |
Qwen3-VL |
4B |
0.25 |
32 |
1 |
1 |
1 |
0.05 |
2942.91 |
28.76 |
16.79 |
Qwen3-VL |
4B |
0.25 |
32 |
1 |
1 |
2 |
0.05 |
2694.4 |
27.75 |
16.7 |
Qwen3-VL |
4B |
0.25 |
32 |
1 |
1 |
4 |
0.05 |
2333.26 |
26.17 |
16.75 |
Qwen3-VL |
4B |
0.25 |
32 |
1 |
1 |
8 |
0.05 |
1853.63 |
23.41 |
16.75 |
Qwen3-VL |
4B |
0.25 |
32 |
1 |
1 |
16 |
0.05 |
1318.71 |
19.3 |
16.8 |
Qwen3-VL |
4B |
0.25 |
32 |
1 |
1 |
31.95 |
0.05 |
812.86 |
14.9 |
16.77 |
Qwen3-VL |
8B |
0.25 |
32 |
1 |
1 |
0.25 |
0.05 |
2181.26 |
21.44 |
10.99 |
Qwen3-VL |
8B |
0.25 |
32 |
1 |
1 |
0.5 |
0.05 |
2167.41 |
21.11 |
11.01 |
Qwen3-VL |
8B |
0.25 |
32 |
1 |
1 |
1 |
0.05 |
2131.02 |
20.63 |
11 |
Qwen3-VL |
8B |
0.25 |
32 |
1 |
1 |
2 |
0.05 |
1999.02 |
20.16 |
11.01 |
Qwen3-VL |
8B |
0.25 |
32 |
1 |
1 |
4 |
0.05 |
1794.95 |
19.31 |
11 |
Qwen3-VL |
8B |
0.25 |
32 |
1 |
1 |
8 |
0.05 |
1497.96 |
17.75 |
10.98 |
Qwen3-VL |
8B |
0.25 |
32 |
1 |
1 |
16 |
0.05 |
1128.48 |
15.3 |
11.01 |
Qwen3-VL |
8B |
0.25 |
32 |
1 |
1 |
31.95 |
0.05 |
754.68 |
12.38 |
10.96 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
0.25 |
0.05 |
6690.44 |
110.5 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
0.5 |
0.05 |
6677.75 |
109.73 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
1 |
0.05 |
6664.77 |
105.83 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
2 |
0.05 |
6620.46 |
103.7 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
4 |
0.05 |
6480.96 |
101.26 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
8 |
0.05 |
6251.86 |
95.87 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
16 |
0.05 |
5863.87 |
86.99 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
32 |
0.05 |
5212.83 |
74.22 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
64 |
0.05 |
4270.56 |
56.36 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
128 |
0.05 |
3139.62 |
38.44 |
2.01 |
Qwen3.5 |
0.8B |
0.25 |
256 |
1 |
1 |
255.95 |
0.05 |
2054.41 |
23.84 |
2.01 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
0.25 |
0.05 |
5187.01 |
65.85 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
0.5 |
0.05 |
5156.94 |
65.76 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
1 |
0.05 |
5150.75 |
64.52 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
2 |
0.05 |
5122.19 |
63.69 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
4 |
0.05 |
5049.64 |
62.39 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
8 |
0.05 |
4905.31 |
60.82 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
16 |
0.05 |
4662.94 |
56.89 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
32 |
0.05 |
4238.52 |
50.98 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
64 |
0.05 |
3595.36 |
41.98 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
128 |
0.05 |
2758.95 |
31.26 |
1.4 |
Qwen3.5 |
2B |
0.25 |
256 |
1 |
1 |
255.95 |
0.05 |
1884.24 |
20.8 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
0.25 |
0.05 |
2303 |
31.81 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
0.5 |
0.05 |
2299.06 |
31.7 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
1 |
0.05 |
2295.52 |
31.31 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
2 |
0.05 |
2273.18 |
31 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
4 |
0.05 |
2234.48 |
30.52 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
8 |
0.05 |
2164.45 |
29.58 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
16 |
0.05 |
2034.1 |
27.99 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
32 |
0.05 |
1819.19 |
25.32 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
64 |
0.05 |
1500.94 |
21.26 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
128 |
0.05 |
1113.01 |
16.07 |
1.4 |
Qwen3.5 |
4B |
0.25 |
256 |
1 |
1 |
255.95 |
0.05 |
734.65 |
10.84 |
1.4 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
0.25 |
0.05 |
1791.45 |
21.31 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
0.5 |
0.05 |
1788.27 |
21.25 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
1 |
0.05 |
1786.97 |
21.11 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
2 |
0.05 |
1772.91 |
20.95 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
4 |
0.05 |
1749.84 |
20.75 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
8 |
0.05 |
1706.11 |
20.3 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
16 |
0.05 |
1624.98 |
19.56 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
32 |
0.05 |
1484.45 |
18.26 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
64 |
0.05 |
1265.55 |
16.06 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
128 |
0.05 |
977.97 |
12.96 |
0.87 |
Qwen3.5 |
9B |
0.25 |
256 |
1 |
1 |
255.95 |
0.05 |
673.06 |
9.41 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
0.25 |
0.05 |
859.97 |
34.7 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
0.5 |
0.05 |
861 |
34.61 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
1 |
0.05 |
856.94 |
34.22 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
2 |
0.05 |
857.46 |
33.66 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
4 |
0.05 |
856.54 |
33.43 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
8 |
0.05 |
844.56 |
32.57 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
16 |
0.05 |
823 |
31.15 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
32 |
0.05 |
783.06 |
28.66 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
64 |
0.05 |
711.41 |
24.79 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
128 |
0.05 |
601.81 |
19.65 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
2 |
255.95 |
0.05 |
460.45 |
13.89 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
0.25 |
0.05 |
768.17 |
30.08 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
0.5 |
0.05 |
769.69 |
30.03 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
1 |
0.05 |
769.08 |
29.56 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
2 |
0.05 |
765.98 |
29.3 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
4 |
0.05 |
763.84 |
28.91 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
8 |
0.05 |
759.04 |
28.25 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
16 |
0.05 |
737.67 |
27 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
32 |
0.05 |
705.17 |
24.84 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
64 |
0.05 |
645.62 |
21.47 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
128 |
0.05 |
551.62 |
16.89 |
0.87 |
Qwen3.6 |
35B-A3B |
0.25 |
256 |
1 |
1 |
255.95 |
0.05 |
426.96 |
11.91 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
0.25 |
0.05 |
596.38 |
7.09 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
0.5 |
0.05 |
596.28 |
7.08 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
1 |
0.05 |
595.12 |
7.03 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
2 |
0.05 |
590.65 |
7 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
4 |
0.05 |
582.03 |
6.95 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
8 |
0.05 |
566.09 |
6.84 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
16 |
0.05 |
536.86 |
6.65 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
32 |
0.05 |
486.73 |
6.28 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
64 |
0.05 |
410.21 |
5.69 |
0.87 |
Qwen3.6 |
27B |
0.25 |
128 |
1 |
1 |
127.95 |
0.05 |
311.97 |
4.82 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
0.25 |
0.05 |
726.61 |
10.33 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
0.5 |
0.05 |
727.84 |
10.34 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
1 |
0.05 |
726.85 |
10.26 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
2 |
0.05 |
722.92 |
10.2 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
4 |
0.05 |
716.4 |
10.13 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
8 |
0.05 |
704.13 |
9.99 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
16 |
0.05 |
681.06 |
9.73 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
32 |
0.05 |
638.78 |
9.25 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
64 |
0.05 |
567.97 |
8.44 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
128 |
0.05 |
465.23 |
7.16 |
0.87 |
Qwen3.6 |
27B |
0.25 |
256 |
1 |
2 |
255.95 |
0.05 |
341.56 |
5.53 |
0.87 |
Qwen3 |
0.6B |
0.25 |
32 |
1 |
1 |
0.25 |
0.05 |
10432.43 |
86.75 |
0 |
Qwen3 |
0.6B |
0.25 |
32 |
1 |
1 |
0.5 |
0.05 |
10172.32 |
82.74 |
0 |
Qwen3 |
0.6B |
0.25 |
32 |
1 |
1 |
1 |
0.05 |
9773.33 |
77.59 |
0 |
Qwen3 |
0.6B |
0.25 |
32 |
1 |
1 |
2 |
0.05 |
8743.59 |
72.4 |
0 |
Qwen3 |
0.6B |
0.25 |
32 |
1 |
1 |
4 |
0.05 |
7171.56 |
64.55 |
0 |
Qwen3 |
0.6B |
0.25 |
32 |
1 |
1 |
8 |
0.05 |
5345.57 |
53.55 |
0 |
Qwen3 |
0.6B |
0.25 |
32 |
1 |
1 |
16 |
0.05 |
3544.36 |
39.89 |
0 |
Qwen3 |
0.6B |
0.25 |
32 |
1 |
1 |
31.95 |
0.05 |
2120.47 |
27.95 |
0 |
Qwen3 |
1.7B |
0.25 |
32 |
1 |
1 |
0.25 |
0.05 |
7304.76 |
58.3 |
0 |
Qwen3 |
1.7B |
0.25 |
32 |
1 |
1 |
0.5 |
0.05 |
7177.15 |
56.46 |
0 |
Qwen3 |
1.7B |
0.25 |
32 |
1 |
1 |
1 |
0.05 |
7055.86 |
53.66 |
0 |
Qwen3 |
1.7B |
0.25 |
32 |
1 |
1 |
2 |
0.05 |
6461.93 |
51.73 |
0 |
Qwen3 |
1.7B |
0.25 |
32 |
1 |
1 |
4 |
0.05 |
5565.3 |
47.73 |
0 |
Qwen3 |
1.7B |
0.25 |
32 |
1 |
1 |
8 |
0.05 |
4400.91 |
41.47 |
0 |
Qwen3 |
1.7B |
0.25 |
32 |
1 |
1 |
16 |
0.05 |
3099.16 |
32.67 |
0 |
Qwen3 |
1.7B |
0.25 |
32 |
1 |
1 |
31.95 |
0.05 |
1951.17 |
24.08 |
0 |
Qwen3 |
14B |
0.25 |
16 |
1 |
1 |
0.25 |
0.05 |
1328.27 |
12.79 |
0 |
Qwen3 |
14B |
0.25 |
16 |
1 |
1 |
0.5 |
0.05 |
1320.47 |
12.65 |
0 |
Qwen3 |
14B |
0.25 |
16 |
1 |
1 |
1 |
0.05 |
1302.3 |
12.46 |
0 |
Qwen3 |
14B |
0.25 |
16 |
1 |
1 |
2 |
0.05 |
1213.72 |
12.3 |
0 |
Qwen3 |
14B |
0.25 |
16 |
1 |
1 |
4 |
0.05 |
1075.44 |
11.96 |
0 |
Qwen3 |
14B |
0.25 |
16 |
1 |
1 |
8 |
0.05 |
881.55 |
11.34 |
0 |
Qwen3 |
14B |
0.25 |
16 |
1 |
1 |
15.95 |
0.05 |
648.62 |
10.58 |
0 |
Qwen3 |
14B |
0.25 |
32 |
1 |
2 |
0.25 |
0.05 |
1725.17 |
21.89 |
0 |
Qwen3 |
14B |
0.25 |
32 |
1 |
2 |
0.5 |
0.05 |
1719.87 |
21.68 |
0 |
Qwen3 |
14B |
0.25 |
32 |
1 |
2 |
1 |
0.05 |
1699.69 |
20.99 |
0 |
Qwen3 |
14B |
0.25 |
32 |
1 |
2 |
2 |
0.05 |
1622.71 |
20.66 |
0 |
Qwen3 |
14B |
0.25 |
32 |
1 |
2 |
4 |
0.05 |
1494.97 |
19.99 |
0 |
Qwen3 |
14B |
0.25 |
32 |
1 |
2 |
8 |
0.05 |
1295.09 |
18.82 |
0 |
Qwen3 |
14B |
0.25 |
32 |
1 |
2 |
16 |
0.05 |
1026.16 |
16.81 |
0 |
Qwen3 |
14B |
0.25 |
32 |
1 |
2 |
31.95 |
0.05 |
724.67 |
14.32 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
0.25 |
0.05 |
1179.75 |
31.31 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
0.5 |
0.05 |
1222.4 |
30.78 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
1 |
0.05 |
1244 |
29.04 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
2 |
0.05 |
1204.55 |
27.89 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
4 |
0.05 |
1116.98 |
26.25 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
8 |
0.05 |
976.55 |
23.48 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
16 |
0.05 |
773.49 |
19.46 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
32 |
0.05 |
542.57 |
14.52 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
64 |
0.05 |
339.75 |
9.67 |
0 |
Qwen3 |
30b_a3b |
0.25 |
128 |
1 |
1 |
127.95 |
0.05 |
193.92 |
5.91 |
0 |
Qwen3 |
30b_a3b |
0.25 |
32 |
1 |
1 |
0.25 |
0.05 |
1197.97 |
32.47 |
0 |
Qwen3 |
30b_a3b |
0.25 |
32 |
1 |
1 |
0.5 |
0.05 |
1265.11 |
32 |
0 |
Qwen3 |
30b_a3b |
0.25 |
32 |
1 |
1 |
1 |
0.05 |
1277.01 |
30.15 |
0 |
Qwen3 |
30b_a3b |
0.25 |
32 |
1 |
1 |
2 |
0.05 |
1241.68 |
29.16 |
0 |
Qwen3 |
30b_a3b |
0.25 |
32 |
1 |
1 |
4 |
0.05 |
1143.92 |
27.52 |
0 |
Qwen3 |
30b_a3b |
0.25 |
32 |
1 |
1 |
8 |
0.05 |
997.34 |
24.78 |
0 |
Qwen3 |
30b_a3b |
0.25 |
32 |
1 |
1 |
16 |
0.05 |
786.09 |
20.61 |
0 |
Qwen3 |
30b_a3b |
0.25 |
32 |
1 |
1 |
31.95 |
0.05 |
549.09 |
16.22 |
0 |
Qwen3 |
4B |
0.25 |
32 |
1 |
1 |
0.25 |
0.05 |
3195.34 |
23.03 |
0 |
Qwen3 |
4B |
0.25 |
32 |
1 |
1 |
0.5 |
0.05 |
3161.35 |
22.57 |
0 |
Qwen3 |
4B |
0.25 |
32 |
1 |
1 |
1 |
0.05 |
3098.86 |
22.02 |
0 |
Qwen3 |
4B |
0.25 |
32 |
1 |
1 |
2 |
0.05 |
2829.65 |
21.54 |
0 |
Qwen3 |
4B |
0.25 |
32 |
1 |
1 |
4 |
0.05 |
2433.82 |
20.6 |
0 |
Qwen3 |
4B |
0.25 |
32 |
1 |
1 |
8 |
0.05 |
1919.4 |
19.01 |
0 |
Qwen3 |
4B |
0.25 |
32 |
1 |
1 |
16 |
0.05 |
1352.64 |
16.4 |
0 |
Qwen3 |
4B |
0.25 |
32 |
1 |
1 |
31.95 |
0.05 |
849.55 |
13.32 |
0 |
Qwen3 |
8B |
0.25 |
16 |
4 |
1 |
0.25 |
0.05 |
2312.91 |
57.27 |
0 |
Qwen3 |
8B |
0.25 |
16 |
4 |
1 |
0.5 |
0.05 |
2296.38 |
55.19 |
0 |
Qwen3 |
8B |
0.25 |
16 |
4 |
1 |
1 |
0.05 |
2258.9 |
53.32 |
0 |
Qwen3 |
8B |
0.25 |
16 |
4 |
1 |
2 |
0.05 |
2112.91 |
50.46 |
0 |
Qwen3 |
8B |
0.25 |
16 |
4 |
1 |
4 |
0.05 |
1886.47 |
45.44 |
0 |
Qwen3 |
8B |
0.25 |
16 |
4 |
1 |
8 |
0.05 |
1562.4 |
38.05 |
0 |
Qwen3 |
8B |
0.25 |
16 |
4 |
1 |
15.95 |
0.05 |
1161.1 |
31 |
0 |
Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
0.25 |
0.05 |
2322.72 |
22.45 |
0 |
Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
0.5 |
0.05 |
2306.02 |
22.07 |
0 |
Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
1 |
0.05 |
2274.67 |
21.56 |
0 |
Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
2 |
0.05 |
2122.8 |
21.13 |
0 |
Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
4 |
0.05 |
1892.83 |
20.2 |
0 |
Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
8 |
0.05 |
1566.29 |
18.67 |
0 |
Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
16 |
0.05 |
1154.62 |
16.21 |
0 |
Qwen3 |
8B |
0.25 |
32 |
1 |
1 |
31.95 |
0.05 |
745.13 |
11.5 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
0.25 |
0.05 |
1186.91 |
31.49 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
0.5 |
0.05 |
1201.65 |
30.92 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
1 |
0.05 |
1207.09 |
29.17 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
2 |
0.05 |
1175.24 |
28.02 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
4 |
0.05 |
1101.61 |
26.34 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
8 |
0.05 |
969.98 |
23.55 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
16 |
0.05 |
767.18 |
19.57 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
32 |
0.05 |
543.1 |
14.53 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
64 |
0.05 |
341.49 |
9.67 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
1 |
127.95 |
0.05 |
188.02 |
5.91 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
0.25 |
0.05 |
1800.19 |
38.64 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
0.5 |
0.05 |
1849.71 |
38.29 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
1 |
0.05 |
1807.37 |
35.26 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
2 |
0.05 |
1780.94 |
34.19 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
4 |
0.05 |
1660.24 |
32.41 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
8 |
0.05 |
1497.1 |
29.12 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
16 |
0.05 |
1253.95 |
24.29 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
32 |
0.05 |
937.23 |
18.22 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
64 |
0.05 |
622.84 |
12.19 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
128 |
1 |
2 |
127.95 |
0.05 |
359.7 |
7.46 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
0.25 |
0.05 |
1764.65 |
38.36 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
0.5 |
0.05 |
1779.39 |
37.99 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
1 |
0.05 |
1817.58 |
35.41 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
2 |
0.05 |
1783.77 |
34.11 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
4 |
0.05 |
1662.99 |
32.34 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
8 |
0.05 |
1513.65 |
29.05 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
16 |
0.05 |
1249.73 |
24.21 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
32 |
0.05 |
937.94 |
18.23 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
64 |
0.05 |
623.4 |
12.21 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
128 |
0.05 |
359.65 |
7.31 |
0 |
Qwen3-Coder |
30b_a3b |
0.25 |
256 |
1 |
2 |
255.95 |
0.05 |
199.77 |
4.11 |
0 |
Gemma 4 |
26b_a4b |
0.25 |
128 |
1 |
1 |
0.25 |
0.05 |
1317.17 |
27.05 |
0 |
Gemma 4 |
26b_a4b |
0.25 |
128 |
1 |
1 |
1 |
0.05 |
1401.93 |
26.02 |
0 |
Gemma 4 |
26b_a4b |
0.25 |
128 |
1 |
1 |
2 |
0.05 |
1330.96 |
24.98 |
0 |
Gemma 4 |
26b_a4b |
0.25 |
128 |
1 |
1 |
4 |
0.05 |
1342.04 |
23.16 |
0 |
Gemma 4 |
26b_a4b |
0.25 |
128 |
1 |
1 |
8 |
0.05 |
1340.28 |
19.94 |
0 |
Gemma 4 |
26b_a4b |
0.25 |
128 |
1 |
1 |
16 |
0.05 |
1241.37 |
15.34 |
0 |
Gemma 4 |
26b_a4b |
0.25 |
128 |
1 |
1 |
48 |
0.05 |
999.17 |
8.38 |
0 |
Gemma 4 |
26b_a4b |
0.25 |
128 |
1 |
1 |
100 |
0.05 |
548.92 |
1.58 |
0 |
LLM/VLM模型内存占用
下面的表格展示了各LLM/VLM模型在不同运行条件下的资源占用情况,包括输入输出规模、主机内存和设备内存。
表中个字段含义如下:
Model: 模型名称,如 Qwen3、Qwen3-VL。
Size: 模型参数量,如 8B, 30B-A3B。
Ctx(k): 上下文长度(单位千 tokens)。
Batch: 模型推理的 batch 大小。
NChip: 使用的M50芯片数量。
HostMem(MB): 主机内存占用量(MB)。
DeviceMem(MB): M50设备内存占用量(MB)。
HostPeak(MB): 推理过程中主机内存峰值占用(MB)。
Prefill(ms): Prefill模型加载时间(毫秒)。
Decode(ms): Decode模型加载时间(毫秒)。
Vision(ms): VLM模型加载时间(毫秒)。
特别说明
表中主机内存占用量为模型运行过程中的内存使用情况,包括模型加载、主机运行 embedding 计算所需内存,以及系统为达到最佳性能自动分配的共享缓存。模型性能测试过程中调用 Module::Option::EnableHostLazyLoading (C++ API)或 tcim_lite.runtime.Option.enable_host_lazy_loading (Python API),用于在模型加载阶段延迟分配和初始化主机端缓冲区,以降低模型加载阶段和加载完成后的平均内存。
内存占用为当前测试环境下,在保证加载速度和推理性能的前提下的实际使用值。模型运行所需的最小内存通常低于该数值,特别是调用 Module::Option::EnableIOLazyMode (C++ API)或 tcim_lite.runtime.Option.enable_io_lazy_mode (Python API)后,可有效降低加载阶段的峰值内存,但在内存不足的情况下,可能会导致模型加载时间显著增加。
Model |
Size |
Ctx(k) |
Batch |
NChip |
HostMem(MB) |
DeviceMem(MB) |
HostPeak(MB) |
Prefill(ms) |
Decode(ms) |
Vision(ms) |
|---|---|---|---|---|---|---|---|---|---|---|
model |
size |
batch |
nchip |
host_mem(MB) |
device_mem(MB) |
host_peak_mem(MB) |
prefill_load_time(ms) |
decode_load_time(ms) |
vision_load_time(ms) |
|
CoPaw-Flash |
9B |
1 |
1 |
2344.96 |
5114 |
4843.52 |
3343.03 |
535.53 |
0 |
|
DeepSeek-R1-Qwen3 |
8B |
1 |
1 |
5816.32 |
6550 |
6471.68 |
3066.46 |
715.05 |
0 |
|
DeepSeek-R1-Qwen3 |
8B |
2 |
1 |
8120.32 |
11161 |
8120.32 |
3262.38 |
1264.03 |
0 |
|
GPT-OSS |
20B |
1 |
1 |
16732.16 |
19180 |
19087.36 |
5455.42 |
1748.49 |
0 |
|
Qwen2.5-VL |
7B |
1 |
1 |
1955.84 |
6479 |
4505.6 |
4645.93 |
271.46 |
1607.1 |
|
Qwen2.5 |
7B |
1 |
1 |
1730.56 |
4676 |
3502.08 |
3087.46 |
459.49 |
0 |
|
Qwen2.5 |
7B |
2 |
1 |
2068.48 |
5350 |
2805.76 |
1818.94 |
314.25 |
0 |
|
Qwen3-VL |
30b_a3b |
1 |
1 |
1781.76 |
21833 |
19978.24 |
34127.43 |
230.78 |
3637.48 |
|
Qwen3-VL |
4B |
1 |
1 |
5386.24 |
5237 |
5386.24 |
2144.3 |
649.52 |
369.31 |
|
Qwen3-VL |
8B |
1 |
1 |
5847.04 |
7533 |
5847.04 |
2086.83 |
651.01 |
343.66 |
|
Qwen3.5 |
0.8B |
1 |
1 |
3645.44 |
2796 |
3860.48 |
673.45 |
383.11 |
315.64 |
|
Qwen3.5 |
2B |
1 |
1 |
4136.96 |
3792 |
4792.32 |
933.83 |
397.04 |
729.31 |
|
Qwen3.5 |
4B |
1 |
1 |
9574.4 |
7793 |
9912.32 |
2211.92 |
1112.54 |
777.39 |
|
Qwen3.5 |
9B |
1 |
1 |
10311.68 |
10613 |
10362.88 |
2938.76 |
1126.18 |
1089.16 |
|
Qwen3.6 |
35B-A3B |
1 |
2 |
1034.24 |
19347 |
33474.56 |
16384.56 |
1024.69 |
1153.88 |
|
Qwen3.6 |
35B-A3B |
1 |
1 |
1034.24 |
20810 |
19077.12 |
9604.85 |
198.91 |
1211.76 |
|
Qwen3.6 |
27B |
1 |
1 |
2539.52 |
16840 |
14766.08 |
11038.96 |
813.15 |
1181.18 |
|
Qwen3.6 |
27B |
1 |
2 |
2539.52 |
8874 |
14366.72 |
7406.5 |
58.32 |
1125.1 |
|
Qwen3 |
0.6B |
1 |
1 |
3901.44 |
2245 |
3901.44 |
2250.77 |
447.35 |
0 |
|
Qwen3 |
1.7B |
1 |
1 |
4188.16 |
2932 |
4188.16 |
4884.46 |
455.46 |
0 |
|
Qwen3 |
14B |
1 |
1 |
1505.28 |
7711 |
7536.64 |
5455.75 |
304.55 |
0 |
|
Qwen3 |
14B |
1 |
2 |
1505.28 |
3891 |
7936 |
4168.88 |
419.66 |
0 |
|
Qwen3 |
30b_a3b |
1 |
1 |
12892.16 |
22137 |
21575.68 |
38898.93 |
1448.67 |
0 |
|
Qwen3 |
4B |
1 |
1 |
5365.76 |
6626 |
5365.76 |
3379.21 |
682.35 |
0 |
|
Qwen3 |
8B |
4 |
1 |
6963.2 |
11461 |
6963.2 |
15098 |
1310.86 |
0 |
|
Qwen3 |
8B |
1 |
1 |
5816.32 |
7990 |
5816.32 |
16455.49 |
750.09 |
0 |
|
Qwen3-Coder |
30b_a3b |
1 |
1 |
12892.16 |
22137 |
20715.52 |
38084.04 |
1491.52 |
0 |
|
Qwen3-Coder |
30b_a3b |
1 |
2 |
634.88 |
8095 |
15421.44 |
13562.75 |
70.76 |
0 |
|
Gemma 4 |
26b_a4b |
1 |
1 |
35352 |
LLM/VLM模型推理性能可视化
Qwen2.5 7B模型
下图展示了Qwen2.5 7B模型Prefill阶段和Decode阶段吞吐率曲线:
图 1 Qwen2.5 7B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3 8B模型
下图展示了Qwen3 8B模型Prefill阶段和Decode阶段吞吐率曲线:
图 2 Qwen3 8B Prefill阶段和Decode阶段吞吐率曲线图
DeepSeek-R1-Qwen3 8B模型
下图展示了DeepSeek-R1-Qwen3 8B模型Prefill阶段和Decode阶段吞吐率曲线:
图 3 DeepSeek-R1-Qwen3 8B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3 14B模型
下图展示了Qwen3 14B模型Prefill阶段和Decode阶段吞吐率曲线:
图 4 Qwen3 14B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3 30B-A3B模型
下图展示了Qwen3 30B-A3B模型Prefill阶段和Decode阶段吞吐率曲线:
图 5 Qwen3 30B-A3B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3.5 2B模型
下图展示了Qwen3.5 2B模型Prefill阶段和Decode阶段吞吐率曲线:
图 6 Qwen3.5 2B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3.5 4B模型
下图展示了Qwen3.5 4B模型Prefill阶段和Decode阶段吞吐率曲线:
图 7 Qwen3.5 4B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3.5 9B模型
下图展示了Qwen3.5 9B模型Prefill阶段和Decode阶段吞吐率曲线:
图 8 Qwen3.5 9B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3.6 27B模型
下图展示了Qwen3.6 27B模型Prefill阶段和Decode阶段吞吐率曲线:
图 9 Qwen3.6 27B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3.6 35B-A3B模型
下图展示了Qwen3.6 35B-A3B模型Prefill阶段和Decode阶段吞吐率曲线:
图 10 Qwen3.6 35B-A3B Prefill阶段和Decode阶段吞吐率曲线图
GPT-OSS 20B模型
下图展示了GPT-OSS 20B模型Prefill阶段和Decode阶段吞吐率曲线:
图 11 GPT-OSS 20B Prefill阶段和Decode阶段吞吐率曲线图
Qwen2.5-VL 7B模型
下图展示了Qwen2.5-VL 7B模型Prefill阶段和Decode阶段吞吐率曲线:
图 12 Qwen2.5-VL 7B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3-VL 4B模型
下图展示了Qwen3-VL 4B模型Prefill阶段和Decode阶段吞吐率曲线:
图 13 Qwen3-VL 4B Prefill阶段和Decode阶段吞吐率曲线图
Qwen3-VL 8B模型
下图展示了Qwen3-VL 8B模型Prefill阶段和Decode阶段吞吐率曲线:
图 14 Qwen3-VL 8B Prefill阶段和Decode阶段吞吐率曲线图
Gemma 4 26B-A4B模型
下图展示了Gemma 4 26B-A4B模型Prefill阶段和Decode阶段吞吐率曲线:
图 15 Gemma 4 26B-A4B Prefill阶段和Decode阶段吞吐率曲线图