4. 指令说明

在工具所在目录下，运行下面指令获取编译后模型信息：

./readhmm [OPTIONS] <binary_model_file_name>

指令参数说明如下：

<binary_model_file_name>：必选参数，指定要查询的编译后模型文件路径。工具将基于该模型提取相关信息。
[OPTIONS]：指令参数，用于指定要获取的模型信息，支持的可选参数如下：
- -h, --help：显示帮助信息并退出程序。
- -v, --version：显示该工具版本信息并退出程序。
- --save_json [file_path]：将所有模型信息保存为 JSON 文件。可选参数 file_path 用于指定输出文件路径，若未提供则使用默认路径（空字符串）。
- --info：显示模型编译时的配置信息，主要指在 tcim.builder.api.build_from_hmonnx API中设置的选项。即使未显式指定该参数，工具也会默认输出相关信息。
- --all：显示模型的全部信息，包括输入输出 tensor、量化信息、内存占用及子模型信息。
- --mem_usage：显示模型在M50设备上的预估内存占用情况。
- --input：显示模型输入 tensor 的信息。
- --output：显示模型输出 tensor 的信息。
- --quant_info：显示模型的量化配置信息。
- --models：显示多模型文件中包含的子模型信息。仅用于多后摩设备部署场景。

4.1. 示例

4.1.1. 获取模型编译信息

在工具所在目录下，执行下面指令，获取模型编译时配置信息：

./readhmm --info resnet50.hmm

返回信息示例如下：

{
  "info": {
    "writable_enable": "1",
    "target": "xh2",
    "commit_hash": "8af859b19",
    "core_num": 1,
    "git_status": "clean",
    "name": "resnet50_xh2_b1_1roi_1core_O2_static",
    "args_num": 4,
    "batch_num": 1,
    "hmquant_version": "xh2a_1.4.0",
    "version": "v1.4.0",
    "pack_inout": "false",
    "tile_num": 16,
    "compatibility_version": "cv1",
    "build_option": "{'target': 'xh2', 'opt_level': 2, 'march': 'v2', 'jobs': 16, 'output_name': 'resnet50_xh2_b1_1roi_1core_O2_static', 'set_batch_size': 1, 'modify_llm': {}, 'ncore': 1, 'enable_profile': False, 'enable_riscv_trace': False, 'intrinsic_mode': True, 'binary_mode': 0, 'analyze_ddr_bandwidth_usage': False, 'profile_primitive_operator': '', 'enable_model_io_connect_in_device': True, 'enable_bundle_lora_param': False, 'enable_dynamic_image_resize': False, 'llm_opt': False, 'ndevice': 0, 'one_img_multi_roi': False, 'subgraph_level': 0, 'subgraph_repeat_hint': 20, 'enable_xh2_stable_output': False, 'enable_xh2_sparse_feature': False, 'flash_attention': 0, 'cpp_backend': 'v2', 'skip_check': True, 'emit_cpp_extra_args': '', 'device_kernel_split': 1, 'skip_prolog_epilog_recording': False, 'subtarget': None, 'constant_from_file_dir': '', 'moe_device_sharding': 'ep'}",
    "custom_msg": "{\"input.1\": {\"shape\": [1, 3, 224, 224], \"resizer_mode\": 3, \"input_cfg\": {\"shape\": [1, 3, 224, 224], \"data_format\": \"RGB\", \"mean\": [123.675, 116.28, 103.53], \"std\": [58.395, 57.12, 57.375], \"resize_type\": 0, \"resizer\": {\"toYUV_format\": \"YUV420SP\"}}}}"
  }
}

4.1.2. 获取模型输入和输出信息

在工具所在目录下，执行下面指令，获取模型输入和输出tensor信息：

./readhmm --input --output resnet50.hmm

返回信息示例如下：

{
  "output": [
    {
      "mem_type": "DEV0",
      "dtype": "FLOAT16",
      "block_layout": [
        1,
        128
      ],
      "name": "495",
      "format": "FMT_ND",
      "shape": [
        1,
        1000
      ],
      "stride": [
        2048,
        2
      ]
    }
  ],
  "info": {
    "writable_enable": "1",
    "target": "xh2",
    "commit_hash": "8af859b19",
    "core_num": 1,
    "git_status": "clean",
    "name": "resnet50_xh2_b1_1roi_1core_O2_static",
    "args_num": 4,
    "batch_num": 1,
    "hmquant_version": "xh2a_1.4.0",
    "version": "v1.4.0",
    "pack_inout": "false",
    "tile_num": 16,
    "compatibility_version": "cv1",
    "build_option": "{'target': 'xh2', 'opt_level': 2, 'march': 'v2', 'jobs': 16, 'output_name': 'resnet50_xh2_b1_1roi_1core_O2_static', 'set_batch_size': 1, 'modify_llm': {}, 'ncore': 1, 'enable_profile': False, 'enable_riscv_trace': False, 'intrinsic_mode': True, 'binary_mode': 0, 'analyze_ddr_bandwidth_usage': False, 'profile_primitive_operator': '', 'enable_model_io_connect_in_device': True, 'enable_bundle_lora_param': False, 'enable_dynamic_image_resize': False, 'llm_opt': False, 'ndevice': 0, 'one_img_multi_roi': False, 'subgraph_level': 0, 'subgraph_repeat_hint': 20, 'enable_xh2_stable_output': False, 'enable_xh2_sparse_feature': False, 'flash_attention': 0, 'cpp_backend': 'v2', 'skip_check': True, 'emit_cpp_extra_args': '', 'device_kernel_split': 1, 'skip_prolog_epilog_recording': False, 'subtarget': None, 'constant_from_file_dir': '', 'moe_device_sharding': 'ep'}",
    "custom_msg": "{\"input.1\": {\"shape\": [1, 3, 224, 224], \"resizer_mode\": 3, \"input_cfg\": {\"shape\": [1, 3, 224, 224], \"data_format\": \"RGB\", \"mean\": [123.675, 116.28, 103.53], \"std\": [58.395, 57.12, 57.375], \"resize_type\": 0, \"resizer\": {\"toYUV_format\": \"YUV420SP\"}}}}"
  },
  "input": [
    {
      "mem_type": "DEV0",
      "dtype": "UINT8",
      "block_layout": [
        1,
        1,
        1
      ],
      "name": "input.1.y",
      "format": "FMT_ND",
      "shape": [
        1,
        224,
        224
      ],
      "stride": [
        50176,
        224,
        1
      ]
    },
    {
      "mem_type": "DEV0",
      "dtype": "UINT8",
      "block_layout": [
        1,
        1,
        1,
        1
      ],
      "name": "input.1.uv",
      "format": "FMT_ND",
      "shape": [
        1,
        112,
        112,
        2
      ],
      "stride": [
        25088,
        224,
        2,
        1
      ]
    }
  ]
}

4.2. 回显信息说明

指令返回信息如下：

quant_info： 模型每个 tensor 的量化配置信息。
output： 模型每个输出 tensor的信息：
- block_layout：内部信息，用户无需关注。
- shape：输出 tensor的形状。
- format：输出 tensor存储格式。
- stride：输出 tensor每个维度的步长（字节）。
- dtype：输出 tensor的数据类型。
- name：输出 tensor的名称或标识。
- mem_type：内部信息。用户无需关注。
mem_usage： 模型在后摩设备端的内存占用预估：
- total：模型总内存占用（字节），包括权重、输入输出等。
- kernel：内核占用内存大小（字节）。
- weights：模型权重在后摩设备端的内存占用量（字节）。权重可以在多模型或多设备间共享，该字段表示未共享情况下的内存总占用量。
- inout：输入输出 tensor在后摩设备端的内存占用量（字节）。Tensor 在设备上可共享内存，该字段表示未共享情况下的内存总占用量。
- workspace：模型运行时工作区在后摩设备端的内存占用量（字节）。在特定场景下可共享，该字段表示未共享情况下的内存总占用量。
info： 模型编译的配置信息，包括：
- target：目标后摩芯片型号。
- custom_msg：自定义信息或备注字段。
- commit_hash：内部信息，用户无需关注。
- build_option：编译时配置参数信息。
- hmquant_version：量化工具版本信息。
- git_status：内部信息，用户无需关注。
- pack_inout：内部信息，用户无需关注。
- batch_num：模型编译时设置的 batch 数。
- core_num：模型编译时设置的M50设备IPU内核数。
- tile_num：内部信息，用户无需关注。
- args_num：内部信息，用户无需关注。
- version：执行模型编译时使用的编译工具版本信息。
- name：模型名称。
- compatibility_version：M50硬件版本。预留字段，用户无需关注。
- writable_enable：内部信息，用户无需关注。
input： 模型每个输入 tensor的信息：
- block_layout：内部信息，用户无需关注。
- shape：输入 tensor的形状。
- format：输入 tensor存储格式。
- stride：输入 tensor每个维度的步长（字节）。
- dtype：输入 tensor的数据类型。
- name：输入 tensor的名称或标识。
- mem_type：内部信息，用户无需关注。