5. 应用开发实践

该章节通过介绍简单的应用开发实践，让开发者能够快速了解如何通过后摩软件平台量化、编译和推理模型。

5.1. CV类模型

下面以 ResNet50 为例，展示从原始模型到可部署推理模型的完整流程，包含以下三个阶段：

模型量化： 将浮点模型量化为低精度模型，以提升推理效率。
模型编译： 编译量化后的模型，使其适配后摩M50硬件推理要求。
模型推理： 使用编译后的模型完成图片分类推理，输出预测类别及对应概率，实现完整的图片分类任务。

开发代码可在软件平台提供的Docker镜像中运行，Docker镜像部署详情参看安装与部署。

5.1.1. 量化模型

软件平台提供HMQuantool工具用于模型量化。下面示例展示关键步骤代码，仅供参考，不可以直接拷贝运行。

注意

模型量化仅支持在Ubuntu 24.04 GPU和Ubuntu 24.04 CPU上完成，其他环境依赖可参看《HMQuantool量化工具用户指南》。

5.1.1.1. 模型量化步骤

量化主要步骤如下：

准备原始ResNet50 ONNX 模型。

模型需为 FP16 类型的 ONNX 文件，若原始模型为其他数据类型，需用户在使用前自行转换为 FP16 格式。此外，为获得更优性能，推荐使用 ONNX 格式作为输入。

可使用下面方式下载示例模型：
1. 下载应用开发示例包（示例代码）。
2. 进入 houmo-examples-xh2/models/backbone/resnet50 目录。
3. 运行下面脚本下载ResNet50模型：
```
python3 get_model.py --type raw
```

创建模型量化脚本，如命名为 ptq.py，并编写脚本：

导入依赖库。

import os
import shutil
import torch
from xhquant.api import (
    DeviceType,
    QuantScheme,
    convert_onnx_to_hmonnx,
    create_quant_config,
)

准备输入参数。根据模型实际路径更改 onnx_file 参数。

# Set the path of input model
onnx_file = "./resnet50.onnx"
# Generate random input data
random_data = torch.randn(input_shape, dtype=torch.float32)
# Set directory to save quantized model
output_dir = "output/xh2"
quant_type = "w8a8h1_sefp"
hmonnx_model_path = os.path.join(output_dir, f"resnet50_xh2_{quant_type}.onnx")
# Set input and output node names of the quantized model
input_name = "input.1"
output_name = "495"

初始化量化环境。
```
xhquant_init(None, debug=args.debug)
```

设置量化参数。

# Set quantization configurations
quant_scheme = QuantScheme(target_device=DeviceType.XH2a, quant_type=quant_type)
quant_config = create_quant_config(quant_scheme)

量化模型。

convert_onnx_to_hmonnx(
    model_path,  # Input model path
    [random_data],  # Random data
    device_type=DeviceType.XH2a,  # Device type for quantization
    out_hmonnx_file=hmonnx_model_path, # Output path
    quant_config=quant_config,  # Quantization configurations
    input_names=[input_name],  # Input node name of the quantized model
    output_names=[output_name],  # Output node name of the quantized model
)

运行脚本量化模型：
```
python3 ptq.py
```
量化后模型文件（.onnx）保存在 output/xh2 目录下。

完整示例代码，可参看应用开发示例包（示例代码）中 houmo-examples-xh2/apis/converts/resnet50/xh2/ptq.py。

5.1.2. 编译模型

模型量化后，可通过后摩大道^® M50 TCIM推理加速引擎编译模型。下面示例展示关键步骤代码，仅供参考，不可以直接拷贝运行。

5.1.2.1. 模型编译步骤

ResNet50模型编译步骤如下：

注意

仅支持在Linux X86 架构下编译模型。

创建模型编译脚本，如命名为 build.py，并编写脚本：

导入依赖库。

import os
import tcim
import tcim_lite
import numpy as np
import logging

准备输入参数。根据模型实际路径更改 hmonnx_model_path 参数。

# Set input and output model path
output_dir = "output/xh2"
hmonnx_model_path = os.path.join(output_dir, f"resnet50_xh2_{quant_type}.onnx")
# Set compilation configurations
ncore = 1
opt_level = "O2"
work_dir = os.path.join(output_dir, "tcim")
# Set output model names
hmmodel_name = f"resnet50_xh2_1batch_{ncore}core_{opt_level}"

调用 build_from_hmonnx 编译量化后模型。

tcim.build_from_hmonnx(
    hmonnx_model_path, # Input model for compilation
    output_name=hmmodel_name, # Output model name
    ncore=ncore, # The number of cores for compilation
    opt_level=opt_level, # The optimization level applied during compilation
    target="xh2",  # Optimization level applied during compilation
    batch=1, # The batch size of the model
    output_dir=output_dir, # The directory to save the output
    work_dir=work_dir, # The directory for saving intermediate files
)

运行脚本编译模型。
```
python3 build.py
```
编译后模型文件（.hmm）保存在 output_dir 目录下。

完整示例代码，可参看应用开发示例包（示例代码）中 houmo-examples-xh2/apis/converts/resnet50/xh2/build.py。

5.1.2.2. 直接获取已编译模型

如果不具备编译环境，可直接下载已编译的ResNet50模型：

下载应用开发示例包（示例代码）。
进入 houmo-examples-xh2/models/backbone/resnet50 目录。
运行下面脚本下载已编译模型：
```
python3 get_model.py --type hmm
```

5.1.3. 推理模型

模型编译后，可使用TCIM Python接口推理模型。下面示例展示如何推理ResNet50模型，完成图片分类任务。TCIM会自动管理推理过程中所需的内存，无需用户手动为输入和输出张量分配内存空间，默认使用主机内存。

创建模型推理脚本，如命名为 resnet50.py，并编写推理脚本：

导入依赖库。

import os
import sys
import numpy as np
import cv2
import torch
import tcim_lite as tcim

加载编译后模型。根据模型实际路径更改 model_path 参数。

model_path = "./resnet50.hmm"
module = tcim.runtime.load(model_path)

图像预处理。示例对 snake.jpg 图像进行下面处理：

图像格式从BGR转为RGB。
图像大小调整为 224 x 224。
按通道进行标准化处理。
数据变为 NCHW 格式。
转换为 float16 类型以匹配模型输入要求。

# Load the image
input_data = cv2.imread("../../data/snake.jpg")
# Convert image format from BGR to RGB
input_data = cv2.cvtColor(input_data, cv2.COLOR_BGR2RGB)
# Resize the image to (224, 224)
input_data = cv2.resize(input_data, (224, 224))
# Define the mean values for each RGB channel
mean_arr = np.array([123.675, 116.28, 103.53])
# Define the standard deviation values for each RGB channel
std_arr = np.array([58.395, 57.12, 57.375])
# Normalize the image
image_norm = (input_data - mean_arr) / std_arr
# Change the data layout from HWC to CHW
image_norm = np.transpose(image_norm, (2, 0, 1))
# Add a batch dimension
image_norm = np.expand_dims(image_norm, axis=0)
# Convert the data type to float16 to match model input requirements
input_data = image_norm.astype(np.float16)

用户可在应用开发示例包（示例代码）中 houmo-examples-xh2/apis/data 目录下获得 snake.png 图像。

设置输入数据。

# get the total number of inputs
input_num = module.get_num_inputs()
# For each input
for id in range(0, input_num):
    # Get the input name
    input_name = module.get_input_name(id)
    # Get the information about input data
    input_info = module.get_input_info(input_name).ascontiguous()
    # Set input data to the input with the given input name
    module.set_input(input_name, input_data)

模型推理及同步。
```
module.run()
module.sync()
```

获取输出数据。

result_check = True
# Get the total number of outputs
output_num = module.get_num_outputs()
# For each output:
for id in range(output_num):
    # Get the output name
    output_name = module.get_output_name(id)
    # Get the information about output data
    output_info = module.get_output_info(output_name).ascontiguous().astype(np.float32)
    # Get the output data
   output_data = module.get_output(output_name).astype(np.float32).numpy()

预测结果展示，包括prob（置信度）和cls（预测类别ID）等。

from postprocess import softmax
# Convert output data into probabilities
output_data = softmax(output_data)
topk = 5
# Sort the output data in descending order and select the top topk predictions
pred_list = np.argsort(-output_data, axis=1, kind="quicksort").flatten()[0:topk]
prob_list = output_data.flatten()
# Iterate through the top predictions
for i, id in enumerate(pred_list):
    print("top {}: predict cls = {}, prob = {:.6f}".format(i+1, id, prob_list[id]))

# check result, modify it when you change model or data
assert(pred_list[0] == 65)

print("<=== resnet50 python example completed.\n")

运行脚本推理模型。
```
python3 resnet50.py
```

完整示例代码，可参看应用开发示例包（示例代码）中 houmo-examples-xh2/apis/inferences/resnet50.py。

5.2. LLM模型

LLM的量化、编译和推理详细介绍，参看 Qwen3.5模型样例运行。