Classes

Class Buffer

class Buffer

Represents a class for managing memory buffer on host CPU and Houmo device.

Key functions are as follows:

  • Memory allocation on host CPU and Houmo device.

  • Copying data from a specified memory address on host into the Houmo device buffer.

  • Copying data from the Houmo device buffer into a specified memory address on host.

  • Retrieves the pointer to the data stored in the buffer.

  • Retrieves the size of the allocated buffer.

Buffer

tcim::Buffer::Buffer()

Default constructor for the Buffer class.

tcim::Buffer::Buffer(Buffer&&) = default

Default move constructor for the Buffer class.

Parameters:

Buffer -- The Buffer object to be moved from.

tcim::Buffer::Buffer(const Buffer&) = default

Default copy constructor for the Buffer class.

Parameters:

Buffer -- The Buffer object to be copied.

Clone

Buffer tcim::Buffer::Clone(bool auto_copy = true) const

Creates a copy of the current Buffer object.

Parameters:

auto_copy -- [in] Determines if the data stored in the buffer is copied to the new Buffer object.

  • If set to true (default), a new buffer is allocated, and both the data stored in the buffer and its associated metadata (such as size and memory type) are copied to the new buffer.

  • If set to false, a new buffer is allocated, and only the metadata (such as size and memory type) is copied to the new buffer, but its memory is uninitialized and does not contain the original data.

Returns:

Returns a new Buffer object as a copy of the current Buffer object.

CopyFromHost

Status tcim::Buffer::CopyFromHost(const void *src, size_t size, size_t offset = 0)

Copies data from host memory to the buffer on Houmo device.

Parameters:
  • src -- [in] Pointer to the source memory address on the host.

  • size -- [in] The size of buffer memory to be copied in bytes.

  • offset -- [in] The offset within the buffer where the data will be copied. Default is 0.

Returns:

Returns the status of the function call.

CopyTo

Status tcim::Buffer::CopyTo(tcim::Buffer &dst, size_t size = 0, size_t src_off = 0, size_t dst_off = 0) const

Copies data from the current buffer to the specified destination buffer on the same Houmo device.

Parameters:
  • dst -- [out] The destination buffer on the Houmo device.

  • size -- [in] The number of bytes to copy. If set to 0, the smaller size between the source and destination buffers is used by default.

  • src_off -- [in] The offset in the current buffer where the copy starts. Default is 0.

  • dst_off -- [in] The offset in the destination buffer where the copied data is placed. Default is 0.

Returns:

Returns the status of the function call.

CopyToHost

Status tcim::Buffer::CopyToHost(void *dst, size_t size, size_t offset = 0)

Copies data from the buffer on Houmo device to host memory.

Parameters:
  • dst -- [out] Pointer to the destination memory address.

  • size -- [in] The size of buffer memory to be copied in bytes.

  • offset -- [in] The offset within the buffer to start copying data from. Default is 0.

Returns:

Returns the status of the function call.

Data

void *tcim::Buffer::Data() const

Retrieves the memory address of the data stored in the current Buffer object.

Returns:

Returns a pointer to the memory address of the data within the current Buffer object.

Device

tcim::Device tcim::Buffer::Device() const

Retrieves if the buffer is allocated on a host CPU or a Houmo device.

Returns:

Returns the device type associated with the buffer.

DeviceId

int tcim::Buffer::DeviceId() const

Retrieves the logical device ID of the Houmo device associated with the current Buffer object.

Returns:

Returns the logical device ID of the Houmo device associated with the buffer.

GetInitStatus

Status tcim::Buffer::GetInitStatus() const

Retrieves the initialization status of the current Buffer object and related resources.

Returns:

Returns the initial status of the current Buffer object.

GetSubBuffer

Buffer tcim::Buffer::GetSubBuffer(size_t size, size_t offset = 0) const

Retrieves a sub-buffer from the current buffer.

This method returns a sub-buffer that shares the same memory as the original buffer. The sub-buffer's lifecycle is tied to the parent buffer, and it does not allocate additional memory.

Parameters:
  • size -- The size of the sub-buffer to retrieve, in bytes. Must not exceed the remaining size of the buffer starting from the specified offset.

  • offset -- The starting position within the buffer from which the sub-buffer begins, in bytes. Defaults to 0 if not specified.

Returns:

A sub-buffer representing the specified portion of the original buffer.

Note

The returned sub-buffer is a view of the original buffer and does not manage its own memory. Ensure the parent buffer remains valid for the duration of the sub-buffer's usage.

MemSet

Status tcim::Buffer::MemSet(int8_t value = 0, size_t size = 0, size_t offset = 0)

Initializes the specified region of the device buffer to a constant byte value.

Parameters:
  • value -- [in] The byte value used to initialize the target device memory region. Defaults to 0.

  • size -- [in] The number of bytes to initialize. If set to 0, all bytes from offset to the end of the device buffer are initialized.

  • offset -- [in] Byte offset from the beginning of the device buffer. Defaults to 0.

Returns:

Returns the status of the function call.

operator=

Buffer &tcim::Buffer::operator=(Buffer&&) = default

Default move assignment operator for the Buffer class.

Parameters:

Buffer -- The Buffer object to be moved from.

Returns:

A reference to the Buffer object assigned the values of another.

Buffer &tcim::Buffer::operator=(const Buffer&) = default

Default copy assignment operator for the Buffer class.

Parameters:

Buffer -- The Buffer object to be copied.

Returns:

A reference to the Buffer object assigned the values of another.

Size

size_t tcim::Buffer::Size() const

Retrieves the size of the current Buffer object in bytes.

Returns:

Returns the size of the current Buffer object.

CreateDeviceBuffer

static Buffer tcim::Buffer::CreateDeviceBuffer(size_t size, int device_id = 0, const std::string &backend_name = "", const std::string &mem_type = "")

Allocates a buffer on Houmo device with the given memory size and logical device ID.

Parameters:
  • size -- [in] The size of buffer memory to allocate in bytes.

  • device_id -- [in] The logical device ID of the Houmo device on which the buffer memory is allocated. The device 0 is used by default. You can retrieve the logical device IDs via SMI tool. See "SMI Tool User Guide" for details.

  • backend_name -- [in] The backend name of the Houmo device. Only the default value can be used. You can retrieve the backend name via Module::GetBackendName.

  • mem_type -- [in] The special mem_type of the Houmo device. Default is "". *

Returns:

Returns a new Buffer object with memory allocated on the specified device.

static Buffer tcim::Buffer::CreateDeviceBuffer(void *dev_ptr, size_t size, int device_id, const std::string &backend_name)

Creates a Buffer object that represents a memory region on the Houmo device. This function does not allocate new memory but uses the memory region specified by dev_ptr.

Note

You are responsible for:

  • Ensuring the validity of the memory at dev_ptr before calling this function.

  • Keeping dev_ptr valid before all associated Buffer objects are destroyed.

  • Releasing the memory at dev_ptr only after all associated Buffer objects are no longer in use.

Parameters:
  • dev_ptr -- [in] Pointer to the starting memory address of a valid memory region on the Houmo device.

  • size -- [in] The size of the memory region in bytes for the Buffer object. This size must not exceed the allocated memory size at dev_ptr.

  • device_id -- [in] The logical device ID of the Houmo device on which the memory is located. The device 0 is used by default. You can retrieve the logical device IDs via SMI tool. See "SMI Tool User Guide" for details.

  • backend_name -- [in] The backend name of the Houmo device. Only the default value can be used. You can retrieve the backend name via Module::GetBackendName.

Returns:

Returns a new Buffer object representing the specified memory region.

CreateHostBuffer

static Buffer tcim::Buffer::CreateHostBuffer(size_t size, void *ptr = nullptr)

Allocates a buffer on host CPU with the given memory size.

If a memory pointer is set in ptr, the existing memory region will be used for allocation.

Parameters:
  • size -- [in] The size of buffer memory to be allocated in bytes.

  • ptr -- [in] A pointer to the starting memory address used for buffer allocation. Defaults to nullptr, which allocates a new memory. If size` is set to0``, memory will not be allocated.

Returns:

Returns a new Buffer object with memory allocated on the host.

Class CompTensor

class CompTensor

Composite tensor container supporting split/merge operations with offset tracking.

Manages hierarchical tensor decomposition and reconstruction while maintaining spatial relationships between sub-regions.

CompTensor

tcim::CompTensor::CompTensor(Tensor &&tensor)

Construct from rvalue tensor (move semantics).

Parameters:

tensor -- [in] Temporary tensor source. Transfers ownership of tensor data to composite container, the data format of input tensor must be TCIM::DataFmt::CompND.

Warning

Original tensor becomes invalid after this operation.

tcim::CompTensor::CompTensor(Tensor &tensor)

Construct from lvalue tensor (copy semantics).

Parameters:

tensor -- [in] Temporary tensor source. Transfers ownership of tensor data to composite container, the data format of input tensor must be TCIM::DataFmt::CompND.

Warning

Original tensor becomes invalid after this operation.

AsTensor

Tensor &tcim::CompTensor::AsTensor()

Convert to ND tensor representation.

Returns:

Reference to underlying tensor storage.

Warning

Modifications may desynchronize with sub-tensor offsets.

GetInitStatus

Status tcim::CompTensor::GetInitStatus() const

Query composite structure initialization status.

Example

  • StatusCode::ALLOC_FAILURE

Example

  • See error handling in CompTensorDemo.cpp

Returns:

Status object containing:

  • Status::OK

  • Status::UNINITIALIZED

SubTensorOffsets

std::vector<std::vector<int64_t>> &tcim::CompTensor::SubTensorOffsets() const

Access mutable offset coordinates of sub-tensors.

Returns:

Reference to 2D vector where:

  • First dimension indexes sub-tensors.

  • Second dimension contains [N-dimensional offset coordinates].

Warning

Modifications may invalidate spatial consistency.

SubTensors

std::vector<tcim::Tensor> &tcim::CompTensor::SubTensors() const

Access immutable sub-tensor collection.

Returns:

Reference to vector containing:

  • Ordered sub-tensor sequence.

  • Shared metadata with parent tensor.

Note

Sub-tensors maintain spatial relationships defined by offsets.

MergeCompNd

static Tensor tcim::CompTensor::MergeCompNd(const std::vector<Tensor> &roi_tensor, const std::vector<std::vector<int64_t>> &roi_offsets)

Reconstruct composite tensor from sub-regions.

Example

  • See tensor reconstruction in CompTensorDemo::MergeDemo().

Parameters:
  • roi_tensor -- [in] Vector of region-of-interest sub-tensors.

  • roi_offsets -- [in] Corresponding N-dimensional offsets.

Returns:

Merged tensor satisfying:

  • shape == sum(sub_tensor_shapes)

  • dtype == sub_tensors[0].dtype()

  • format == tcim::Format::CompND

Note

Sub-tensors are origins tensor storage.

Class DevManager

class DevManager

Describes the device set on which the target model runs. For multi-GPU models, initialization must be performed using DevManager.

The DevManager class provides various methods to manage and operate device collections, including creating device manager instances, retrieving device counts, and checking initialization status. It is mandatory for multi-GPU models to use this class for proper device management.

DevManager

tcim::DevManager::DevManager() = default

Default constructor.

tcim::DevManager::DevManager(const DevManager&) = default

Default copy constructor.

Parameters:

other -- The DevManager instance to copy.

tcim::DevManager::DevManager(DevManager&&) = default

Default move constructor.

Parameters:

other -- The DevManager instance to move.

~DevManager

virtual tcim::DevManager::~DevManager()

Destructor.

DevCount

int tcim::DevManager::DevCount() const

Retrieves the number of devices.

Returns:

The number of devices managed by this instance.

GetInitStatus

Status tcim::DevManager::GetInitStatus() const

Retrieves the initialization status.

Returns:

The initialization status, of type Status.

operator!=

bool tcim::DevManager::operator!=(const DevManager &other) const

Inequality comparison operator.

Parameters:

other -- The DevManager instance to compare with.

Returns:

true if the two DevManager instances are not equal, false otherwise.

operator=

DevManager &tcim::DevManager::operator=(const DevManager&) = default

Default copy assignment operator.

Parameters:

other -- The DevManager instance to copy.

Returns:

Reference to the current object.

DevManager &tcim::DevManager::operator=(DevManager&&) = default

Default move assignment operator.

Parameters:

other -- The DevManager instance to move.

Returns:

Reference to the current object.

operator==

bool tcim::DevManager::operator==(const DevManager &other) const

Equality comparison operator.

Parameters:

other -- The DevManager instance to compare with.

Returns:

true if the two DevManager instances are equal, false otherwise.

Verify

Status tcim::DevManager::Verify() const

Validates all devices managed by this DevManager.

Returns:

Returns Status::OK if all devices are valid, otherwise returns an error status.

Create

static DevManager tcim::DevManager::Create(const std::vector<int> &dev_indx, const std::string &backend_name = "")

Creates a DevManager instance using a list of device indices.

Parameters:
  • dev_indx -- A list of device indices.

  • backend_name -- The backend name, default is an empty string.

Returns:

A new DevManager instance.

static DevManager tcim::DevManager::Create(const std::vector<std::pair<std::string, int>> &devices)

Creates a DevManager instance using a list of device indices.

Parameters:

devcies -- A list of pair device indices, first is backend_name second is device_id.

Returns:

A new DevManager instance.

static DevManager tcim::DevManager::Create(int dev_id = 0, const std::string &backend_name = "")

Creates a DevManager instance.

Parameters:
  • dev_id -- The device ID, default is 0.

  • backend_name -- The backend name, default is an empty string.

Returns:

A new DevManager instance.

Class LogHandler

class LogHandler

Defines the interfaces for handling log messages generated by the library.

Notes

  • You must inherit from this class and implement the OnLog method to redirect log output (e.g., to a file, console, or external logging system).

  • The OnLog method may be called from multiple internal threads concurrently. Therefore, the implementation must be thread-safe.

  • Avoid performing heavy blocking operations within OnLog to prevent impacting the performance of the main execution pipeline.

~LogHandler

virtual tcim::LogHandler::~LogHandler() = default

Destructor.

Flush

inline virtual void tcim::LogHandler::Flush()

Flushes any buffered log data.

This method is called when the logger needs to ensure all pending logs are written to the destination (e.g., during shutdown or critical errors).

OnLog

virtual void tcim::LogHandler::OnLog(std::string_view msg, int level) = 0

Callback function triggered when a log event occurs.

Parameters:
  • msg -- [in] The formatted log message content.

  • level -- [in] The severity level of the log (corresponding to spdlog levels).

Class Module

class Module

Represents a module instance used for model inference in runtime, the weight manager will delay create in first time to call Module::LoadFromFile.

You can initialize modules using the following methods with the binary model file (.hmm or .hmms):

  • The static function: Module::LoadFromFile. For example:

    auto module = Module::LoadFromFile("tcim_resnet50.hmm");
    

  • The constructor function: Module::Module. For example:

    Module("tcim_resnet50.hmm");
    

  • The default constructor function. You can create an empty module and then load the binary model file with the Module::LoadModel method. For example:

    Module module;
    module.LoadModel("tcim_resnet50.hmm");
    

Note

If a Module object is defined as a global variable, it must be explicitly destroyed before the main function exits.

Nested Class Option

class Option

Represents the configurations for initializing a Module object and loading the binary model file (.hmm or .hmms).

Example

Option

explicit tcim::Module::Option::Option(int device_id = 0)

Constructs an Option object with the specified device ID.

Parameters:

device_id -- [in] The logical ID of the Houmo device used for loading and inferring a model. The device 0 is used by default. You can retrieve the logical device IDs via SMI tool. See "SMI Tool User Guide" for details.

explicit tcim::Module::Option::Option(const DevManager&)

Constructs an Option object with the specified DevManager.

Parameters:

dev_manager -- [in] The DevManager object that defines the device set on which the model will be loaded and run for inference.

explicit tcim::Module::Option::Option(Module::WeightManager&)

Constructs an Option object with the specified WeightManager.

Parameters:

weight_manager -- [in] The WeightManager object to manage the shared weight memory.

EnableIOLazyMode

Option &tcim::Module::Option::EnableIOLazyMode(bool lazy = true)

Controls when input and output buffers used during inference are allocated on host.

When lazy is set to true, input and output buffers are allocated only when they are accessed for the first time. This avoids allocating buffers for unused execution paths, and reduces runtime memory usage. The first access to a buffer may incur allocation overhead.

Parameters:

lazy -- [in] Specifies when buffers used to store input and output data are allocated. Set to true to allocate buffers on first access. Set to false to allocate all buffers during model initialization.

Returns:

Reference to the Option object.

EnableHostLazyLoading

Option &tcim::Module::Option::EnableHostLazyLoading(bool lazy = false)

Controls lazy buffer allocation strategy during model loading phase.

When lazy is set to true, buffer allocation and initialization on the host are deferred until they are actually needed. This strategy reduces the peak host memory footprint during model loading via Module::LoadFromFile and Module::LoadModel APIs. This is beneficial for memory-constrained environments. The trade-off is increased model loading latency due to on-demand buffer initialization.

When lazy is set to false, all buffers are pre-allocated and initialized during model loading, resulting in faster model load completion but higher memory usage.

Parameters:

lazy -- [in] Specifies the buffer allocation strategy for model loading. Set to true to save host memory usage by deferring buffer allocation and initialization. Set to false (default) to speed up model loading.

Returns:

Reference to the Option object.

SetDummyTensors

Option &tcim::Module::Option::SetDummyTensors(std::vector<std::string> &dummy_names)

Sets a list of tensor names of a model.

By default, memory for input and output tensors is allocated during model loading. If this function is called, the memory is not allocated for these tensors when loading a model.

Notes

This function is used in scenarios where multiple models are inferred sequentially, with the output of one model as the input of another model. It optimizes memory usage by avoiding unnecessary memory allocation.

The memory for input and output tensors must be allocated before model inference. You can call Module::GetInput and Module::SetInput to set the output of one model as input of another model.

Example
// ...
// Create a weight manager
auto weight_manager = tcim::Module::WeightManager::CreateWeightManager(0);
// Create an Option object to set configurations for models
tcim::Module::Option option1(weight_manager);
tcim::Module::Option option2(weight_manager);
// Set dummy tensor names
std::vector<std::string> dummy_tensor_names = "model_layers";
option2.SetDummyTensors(dummy_tensor_names)
// Load Qwen models
auto prefill_part1_model = tcim::Module::LoadFromFile("qwen2_prefill_part1.hmm", option1)
auto decode_part1_model = tcim::Module::LoadFromFile("qwen2_decode_part1.hmm", option2)
// Get the output of prefill_part1_model
auto kcache = prefill_part1_model.GetInput("model_layers")
// Set the output of prefill_part1_model as input of decode_part1_model
decode_part1_model.SetInput("model_layers", kcache)

Parameters:

dummy_names -- [in] A vector of tensor names to be set.

Returns:

A reference to the current Option object.

SetModelOffset

Option &tcim::Module::Option::SetModelOffset(size_t offset, size_t size)

Sets the model offset within the file or buffer.

This function allows model data to be stored at a non-zero offset within a file. It enables loading models that are embedded within larger files or combined with other data structures, such as:

  • Models packaged with metadata or other assets.

  • Multiple models concatenated in a single container file.

  • Models with prepended headers or format identifiers.

Example usage:
// Load a model that starts at 1024 bytes into the file, with total size of 50MB
SetModelOffset(1024, 50 * 1024 * 1024);

// Load from the beginning of the file (default behavior)
SetModelOffset(0, file_size);

Parameters:
  • offset -- [in] The starting position (in bytes) of the model data within the file.

  • size -- [in] The size (in bytes) of the model data to be loaded from the specified offset.

Returns:

Reference to the Option object for method chaining.

Note

Important considerations:

  • The offset must be properly aligned according to the model format requirements.

  • The size parameter must match the exact size of the model data to be loaded.

  • Ensure that the file is large enough to contain [offset, offset + size) bytes.

  • This setting may affect file I/O performance depending on the storage medium.

EnableProfile

Option &tcim::Module::Option::EnableProfile(size_t profile_size = 1 << 25)

Enable profile mode. Internal use only.

SetInOutDftMemType

Option &tcim::Module::Option::SetInOutDftMemType(const std::string &mem_type)

Set input output default memory type. Internal use only.

Nested Class RunOption

class RunOption

Represents dynamic options when the initializing or loading binary model files. Internal use only.

RunOption

tcim::Module::RunOption::RunOption()

Default constructor for the RunOption class.

Rounds

RunOption &tcim::Module::RunOption::Rounds(const int64_t v)

Sets the rounds option. Internal use only. The maximum value is 32767.

CoreMask

RunOption &tcim::Module::RunOption::CoreMask(const int64_t v)

Sets the core_mask option. Internal use only.

The core_mask uses binary format where bit i represents core i:

  • core0 = 0x1, core1 = 0x2, core2 = 0x4, core3 = 0x8

  • core0+core1 = 0x3, all four cores = 0xF

  • 0 means no mask (use default configuration).

GetRounds

int64_t tcim::Module::RunOption::GetRounds() const

Gets the execution round option. Internal use only.

GetCoreMask

int64_t tcim::Module::RunOption::GetCoreMask() const

Gets the core_mask option. Internal use only.

Returns the core mask in binary format (bit i = core i).

Nested Class WeightManager

class WeightManager

Represents weight manager reference for Module.

For large language models, like Qwen, if each weight within the models is allocated a block of device memory to store the weight values, it may lead to substantial memory consumption. To save device memory, modules can share their weight memory with a weight manager when performing multi-tasks within a module, or across multi-modules with the same weight values on the same Houmo device.

You can create this object by calling tcim::Module::WeightManager::CreateWeightManager, and load modules with the same weight manager to share the weight memory by calling tcim::Module::LoadModel.

Notes

When using tcim::Module::WeightManager::CreateWeightManager to create a weight manager object, the actual weight manager is not instantiated immediately. This object is initialized only when tcim::Module::LoadModel or tcim::Module::LoadFromFile is called with the provided Option object, which contains the WeightManager. This deferred instantiation allows for more efficient resource allocation during the model loading process.

Example

The following example shows how to create two modules and a weight manager, and then load models with the same weight manager.

// Create Module objects: module_wm_1 and module_wm_2
auto module_wm_1 = std::make_shared<tcim::Module>();
auto module_wm_2 = std::make_shared<tcim::Module>();
{
    // Create a weight manager
    auto weight_manager = tcim::Module::WeightManager::CreateWeightManager(0);
    // Create a Option object to set configurations for loading models
    tcim::Module::Option option(weight_manager);
    // Load model with the weight manager
    module_wm_1->LoadModel("tcim_resnet50.hmm", option);
    module_wm_2->LoadModel("tcim_resnet50.hmm", option);
    // After loading the models, the option and weight manger could be released.
}
module_wm_1.reset();
//The weights will released here
module_wm_2.reset();

CreateWeightManager

static WeightManager tcim::Module::WeightManager::CreateWeightManager(int device_id = 0)

Creates a weight manager for sharing weight memory.

Example

Parameters:

device_id -- [in] (Optional) The logical ID of the Houmo device on which the shared weights are stored and the model is inferred. The device 0 is used by default. You can retrieve the logical device IDs via SMI tool. See "SMI Tool User Guide" for details.

Returns:

Returns a WeightManager object for managing shared weight memory.

static WeightManager tcim::Module::WeightManager::CreateWeightManager(const DevManager &dev_manager)

Creates a weight manager for sharing weight memory.

Example

Parameters:

dev_manager -- [in] The devices that need to share weight memory. It usually used in multi-device model (.hmms)

Returns:

Returns a WeightManager object for managing shared weight memory.

GetInitStatus

Status tcim::Module::WeightManager::GetInitStatus() const

Retrieves the status of the constructor function and related resources.

Returns:

Returns the status of the constructor function.

Module

tcim::Module::Module()

Default constructor.

tcim::Module::Module(const Module &module) = default

Default copy constructor.

Parameters:

module -- [in] The Module object to be copied.

tcim::Module::Module(const std::string &filename, const Option& = Option())

Constructs a module with the given binary model file (.hmm or .hmms) and configuration options.

Parameters:
  • filename -- [in] The file name and path of the binary model file.

  • options -- [in] The configuration options for module initialization, defined in Module::Option.

tcim::Module::Module(const void *data, uint64_t len, const Option& = Option())

Constructs a module from host memory with the given binary model file (.hmm or .hmms), size, and configuration options. The binary model file will be copied to Houmo device memory automatically for inference.

This API is used to load a confidential binary model file, ensuring its security and preventing unauthorized access. The model data should be decrypted before being passed to this API.

Parameters:
  • data -- [in] Pointer to the memory containing the binary model file.

  • len -- [in] The size of the memory in bytes.

  • options -- [in] The configuration options for module initialization, defined in Module::Option.

tcim::Module::Module(Module &&other) = default

Default move constructor.

Parameters:

other -- [in] The Module object from which the resources are transferred.

~Module

tcim::Module::~Module()

Module destructor.

DumpInOut

Status tcim::Module::DumpInOut(const std::string &path)

Dumps all current input and output tensors of the module to the specified directory.

Saves every input and output tensor as a .npy file (preserving the original tensor shape) and generates a model.json file in the target directory that is directly compatible with the TCIM tester (tests/tester/tester.py).

Output directory layout

<path>/
  model.json            -- tester-compatible JSON (Golden + Model sections)
  input_<name>.npy      -- one file per input tensor
  output_<name>.npy     -- one file per output tensor

Notes

  • The directory at path is created automatically if it does not exist.

  • All input tensors are transferred from device memory to contiguous host memory before being written to disk. Output tensors are already in contiguous host memory after Module::Run + Module::Sync.

  • The generated model.json can be passed directly to the tester via python tester.py model.hmm --json <path>/model.json.

Parameters:

path -- [in] Path to the output directory.

Returns:

Returns the status of the function call.

GetBackendName

const std::string &tcim::Module::GetBackendName()

Retrieves the backend name used for the inference of the current Module object.

Returns:

Returned current model backend name.

GetCoreNum

int64_t tcim::Module::GetCoreNum() const

Retrieves the number of IPU cores of the Houmo device used for model inference.

Returns:

Returns the number of IPU cores used for model inferences.

GetCustomMsg

std::string tcim::Module::GetCustomMsg() const

Retrieves the the custom information embeded in the model at model compilation with the custom_msg parameter.

Returns:

Returns the custom information string.

GetDevInput

Tensor tcim::Module::GetDevInput(const std::string &name)

Gets input tensor on Houmo device with the given tensor name.

If the tensor name is set via Module::Option::SetDummyTensors, this function returns UNINITIALIZED.

Parameters:

name -- [in] The name of the input tensor to query for.

Returns:

Returns the input tensor.

Warning

This function does not automatically copy tensor data to contiguous host CPU memory. You need to call Tensor::ToHost and Tensor::CastTo to explicitly copy and convert tensor data to host CPU memory if needed.

GetDevOutput

Tensor tcim::Module::GetDevOutput(const std::string &name)

Gets the original output data from pre-allocated with the given tensor name.

Parameters:

name -- [in] The name of the output tensor to query for.

Returns:

Returns the output tensor.

Warning

This function does not automatically copy tensor data to contiguous host CPU memory. You need to call Tensor::ToHost and Tensor::CastTo to explicitly copy and convert tensor data to host CPU memory if needed.

GetInitStatus

Status tcim::Module::GetInitStatus() const

Retrieves the status of the constructor function and related resources.

Returns:

Returns the status of the constructor function.

GetInput

Tensor tcim::Module::GetInput(const std::string &name)

Gets input tensor with the given tensor name.

If the tensor name is set via Module::Option::SetDummyTensors, this function returns UNINITIALIZED.

Parameters:

name -- [in] The name of the input tensor to query for.

Returns:

Returns the input tensor.

Warning

This function is deprecated and will be removed in the future release. Use Module::GetDevInput instead.

Status tcim::Module::GetInput(const std::string &name, Tensor &tensor)

Gets input data from pre-allocated memory with the given tensor name. The input data includes input tensors defined by the Tensor class.

Notes

  • If the original tensor data in host memory is not stored in a contiguous layout, the tensor data will be copied automatically to contiguous host CPU memory.

  • Before calling this function, you must pre-allocate the tensor parameter by creating a Tensor object. You can call tcim::Tensor::CreateDeviceTensor or tcim::Tensor::CreateHostTensor to create a tensor on Houmo device or host.

  • The pre-allocated tensor must match the model's input tensor in format, data type, and shape, as retrieved via Module::GetInputInfo. This ensures that the input tensor data can be correctly stored.

  • If the tensor name is set via Module::Option::SetDummyTensors, this function returns INVALID_ARGUMENT.

Parameters:
  • name -- [in] The name of the input tensor to query for.

  • tensor -- [out] The input tensor.

Returns:

Returns the status of the function call.

Warning

This function is deprecated and will be removed in the future release. Use Module::GetDevInput instead.

GetInputInfo

TensorInfo tcim::Module::GetInputInfo(const std::string &name) const

Gets the tensor information, such as tensor shape, data type, and format with the given input tensor name.

Parameters:

name -- [in] The name of the input tensor to query for.

Returns:

Returns the tensor information of the tensor.

GetInputName

std::string tcim::Module::GetInputName(int index) const

Gets the name of the index-th input tensor.

Parameters:

index -- [in] The position of the input tensor in the network model to query for.

Returns:

Returns the name of the index-th input tensor.

GetInputNum

size_t tcim::Module::GetInputNum() const

Gets the total number of input tensors in the network model.

Returns:

Returns the total number of input tensors.

GetMemSize

const std::vector<size_t> &tcim::Module::GetMemSize()

Retrieves the memory size of the model in bytes.

Currently, this function is not supported.

Returns:

Returns the size of the model in bytes. The elements of the return array represent the memory size of weight, workspace, inputs and outputs respectively.

GetModelVersion

const std::string &tcim::Module::GetModelVersion()

Retrieves the date when the model is compiled.

Returns:

Returned the date when the model is compiled.

GetOutput

Tensor tcim::Module::GetOutput(const std::string &name)

Gets output data in a contiguous memory layout from pre-allocated memory with the given tensor name, and copies the data to host CPU memory as a new tensor.

Parameters:

name -- [in] The name of the output tensor to query for.

Returns:

Returns the output tensor.

Note

If the original tensor data in host memory is not stored in a contiguous layout, the tensor data will be copied automatically to contiguous host CPU memory.

Tensor tcim::Module::GetOutput(const std::string &name, Device device)

Gets output data from pre-allocated memory with the given tensor name.

Parameters:
  • name -- [in] The name of the output tensor to query for.

  • device -- [in] The device type on which the output data is stored.

Returns:

Returns the output tensor.

Warning

This function is deprecated and will be removed in the future release. Use Module::GetDevOutput instead.

Status tcim::Module::GetOutput(const std::string &name, Tensor &tensor)

Gets output data from pre-allocated memory with the given tensor name. The output data includes output tensors defined by the Tensor class.

Notes

Parameters:
  • name -- [in] The name of the output tensor.

  • tensor -- [out] The output Tensor object.

Returns:

Returns the status of the function call.

Note

This function should be called after Module::Run and Module::Sync.

GetOutputInfo

TensorInfo tcim::Module::GetOutputInfo(const std::string &name) const

Gets the information about the output tensor of the model inference, such as tensor shape, data type, and format with the given output tensor name.

Parameters:

name -- [in] The name of the output tensor.

Returns:

Returns the information of the output tensor.

GetOutputName

std::string tcim::Module::GetOutputName(int index) const

Gets the name of the index-th output tensor.

Parameters:

index -- [in] The position of the output tensor in the network model to query for.

Returns:

Returns the name of the index-th output tensor.

GetOutputNum

size_t tcim::Module::GetOutputNum() const

Gets the total number of output tensors in the network model.

Returns:

Returns the total number of output tensors.

GetWorkspaces

std::vector<Buffer> &tcim::Module::GetWorkspaces()

Retrieves the workspace buffers allocated for the module. Internal use only.

Returns:

Returned workspace buffers.

LoadDumpInput

Status tcim::Module::LoadDumpInput(const std::string &path)

Loads input tensors previously saved by DumpInOut and feeds them directly into this Module.

Reads the model.json produced by a prior DumpInOut call, resolves the .npy file path of each input tensor listed in the Golden.inputs section, loads the data into a temporary host Tensor, and forwards it to the Module via Module::SetInput.

Notes

  • The path must point to a directory produced by a prior DumpInOut call.

  • If the json contains an input name that is not recognised by this Module, a warning is logged and loading continues for the remaining inputs - the function still returns Status::OK.

  • Module must be initialised (LoadModel called) before invoking this function.

Parameters:

path -- [in] Path to the directory that contains model.json and the associated .npy files.

Returns:

Returns the status of the function call.

LoadModel

Status tcim::Module::LoadModel(const std::string &filename, const Option& = Option())

Loads a model with the given the binary model file (.hmm or .hmms) and configuration options.

Parameters:
  • filename -- [in] The file name and path of the binary model file.

  • Option -- The configuration options for loading the binary model file, defined in Module::Option.

Returns:

Returns the status of the function call.

Status tcim::Module::LoadModel(const void *data, uint64_t len, const Option& = Option())

Loads a model from host memory with the given the binary model file (.hmm or .hmms), size, and configuration options. The binary model file will be copied to Houmo device memory automatically for inference.

This API is used to load a confidential binary model file, ensuring its security and preventing unauthorized access. The binary model data should be decrypted before being passed to this API.

Parameters:
  • data -- [in] The data buffer of the binary model file.

  • len -- [in] The buffer length.

  • Option -- The configuration options for loading the binary model file, defined in Module::Option.

Returns:

Returns the status of the function call.

operator bool

explicit tcim::Module::operator bool() const noexcept

Checks if the current Module object is initialized.

Returns:

Returns true if the current Module object is initialized. Otherwise, returns false.

Note

You must initialize the modules first before any further processing on modules. See Module for detailed information on how to initialize modules.

operator!

bool tcim::Module::operator!() const noexcept

Checks if the current Module object is not initialized.

Returns:

Returns true if the current Module object is not initialized. Otherwise, returns false.

Note

You must initialize the modules first before any further processing on modules. See Module for detailed information on how to initialize modules.

operator!=

bool tcim::Module::operator!=(const Module &other) const noexcept

Compares if the current Module refers to the same implementation as another Module object.

Parameters:

other -- [in] The Module object to compare with.

Returns:

Returns true if the current Module object refers to the same implementation as another.

operator=

Module &tcim::Module::operator=(const Module &other) = default

Copy assignment operator.

Assigns the information of a Module object to the current Module object.

Parameters:

other -- [in] The Module object to copy from.

Returns:

Reference to the current Module object.

Module &tcim::Module::operator=(Module &&other) = default

Default move assignment operator.

Parameters:

other -- [in] The Module object from which the resources are transferred.

Returns:

Reference to the current object.

operator==

bool tcim::Module::operator==(const Module &other) const noexcept

Compares if the current Module refers to the same implementation as another Module object.

Parameters:

other -- [in] The Module object to compare with.

Returns:

Returns true if the current Module object refers to the same implementation as another.

Run

Status tcim::Module::Run(bool sync = false, const RunOption& = RunOption())

Infers a model.

Parameters:
  • sync -- [in] (Optional) Specifies if to run the model synchronously. If set to true, the model runs synchronously. The default value is false.

  • RunOption -- The configuration options for model inference.

Returns:

Returns the status of the function call.

SetDevInput

Status tcim::Module::SetDevInput(const std::string &name, const Tensor &tensor)

Sets the input data to the pre-allocated memory on Houmo device with the given tensor name. The input data includes input tensors defined with the Tensor class.

Notes

  • Before calling this function, you must pre-allocate the tensor parameter by creating a Tensor object. You can call tcim::Tensor::CreateDeviceTensor to create a tensor on Houmo device.

  • For multi-device models, the tensor must be on COMPND device.

  • The pre-allocated tensor must match all attributes of the corresponding model input tensor, with the same name specified in the name parameter, as retrieved via Module::GetInputInfo, and must be allocated on the same device type (Houmo device) and device ID as that input tensor.

Parameters:
  • name -- [in] The name of the input tensor.

  • tensor -- [in] The input Tensor object.

Returns:

Returns the status of the function call.

Warning

This function does not automatically copy tensor data to contiguous host CPU memory. You need to call Tensor::ToHost and Tensor::CastTo to explicitly copy and convert tensor data to host CPU memory if needed.

SetDevOutput

Status tcim::Module::SetDevOutput(const std::string &name, Tensor &tensor)

Sets the device output data to the pre-allocated memory on Houmo device with the given tensor name. This operation must be a inplace operation.

Note

  • Before calling this function, you must pre-allocate the tensor parameter by creating a Tensor object. You can call tcim::Tensor::CreateDeviceTensor to create a tensor on Houmo device.

  • For multi-device models, the tensor must be on COMPND device.

  • The pre-allocated tensor must match all attributes of the corresponding model output tensor, with the same name specified in the name parameter, as retrieved via Module::GetOutputInfo, and must be allocated on the same device type (Houmo device) and device ID as that output tensor.

Parameters:
  • name -- [in] The name of the output tensor.

  • tensor -- [in] The output Tensor object.

Returns:

Returns the status of the function call.

Warning

This function does not automatically copy tensor data to contiguous host CPU memory. You need to call Tensor::ToHost and Tensor::CastTo to explicitly copy and convert tensor data to host CPU memory if needed.

SetInput

Status tcim::Module::SetInput(const std::string &name, const Tensor &tensor)

Sets the input data to the pre-allocated memory on host or Houmo device with the given tensor name. The input data includes input tensors defined with the Tensor class.

Notes

This function can only be used with following data restrictions:

For "quantized fixed" data precision type, you must quantize the model before calling this function.

Data Format

Device Type

Data Precision

Data Alignment Required

DataFmt::YUV420SP

CPU

fixed

NO

DataFmt::YUV422SP

CPU

fixed

NO

DataFmt::YUV444SP

CPU

fixed

NO

DataFmt::ND

CPU

quantized fixed *

NO

DataFmt::YUV420SP

HDPL

fixed

YES

DataFmt::YUV422SP

HDPL

fixed

YES

DataFmt::YUV444SP

HDPL

fixed

YES

DataFmt::ND

HDPL

quantized fixed *

YES

Parameters:
  • name -- [in] The name of the input tensor.

  • tensor -- [in] The input Tensor object. For non-image data, the input must be quantized.

Returns:

Returns the status of the function call.

Note

If the original tensor data in host memory is not stored in a contiguous layout, the tensor data will be copied automatically to contiguous host CPU memory.

SetOutput

Status tcim::Module::SetOutput(const std::string &name, Tensor &tensor)

Sets the output data to the pre-allocated memory on Houmo device with the given tensor name. The output data is output tensors defined with the Tensor objects.

Note

The TensorInfo of the tensor set in this function must match the TensorInfo returned by Module::GetInputInfo for the tensor with the same name specified in the name parameter.

Parameters:
  • name -- [in] The name of the output tensor.

  • tensor -- [in] The output Tensor object.

Returns:

Returns the status of the function call.

Note

If the original tensor data in host memory is not stored in a contiguous layout, the tensor data will be copied automatically to contiguous host CPU memory.

SetStream

Status tcim::Module::SetStream(Stream &stream)

Sets the runtime stream used for model inference.

Parameters:

stream -- [in] A reference to the stream used for model inference.

Returns:

Returns the status of the function call.

Sync

Status tcim::Module::Sync()

Waits until the previous operation is completed.

Returns:

Returns the status of the function call.

Note

If the auto_yield parameter was set to true when initializing the Stream object via the constructor, the IPU resources used by this stream will be automatically released after this function completes.

LoadFromFile

static Module tcim::Module::LoadFromFile(const std::string &filename, const Option& = Option())

Creates a module with the given binary model file (.hmm or .hmms) and configuration options.

Parameters:
  • filename -- [in] The file name and path of the binary model file.

  • Option -- The configuration options for loading the binary model file, defined in Module::Option.

Returns:

Returns a new Module object.

LoadFromMem

static Module tcim::Module::LoadFromMem(const void *data, uint64_t len, const Option& = Option())

Creates a module from host memory with the given binary model file (.hmm or .hmms), size, and configuration options. The file will be copied to Houmo device memory automatically for inference.

This API is used to load a confidential binary model file, ensuring its security and preventing unauthorized access. The model data should be decrypted before being passed to this API.

Parameters:
  • data -- [in] Pointer to the memory containing the binary model file.

  • len -- [in] The size of the memory in bytes.

  • Option -- The configuration options for loading the binary model file, defined in Module::Option.

Returns:

Returns a new Module object.

Note

Ensure that the binary model file to be loaded is a precise binary match to the one generated after compilation. Any modification may lead to undefined behavior after calling this API.

LoadModelInfo

static std::string tcim::Module::LoadModelInfo(const std::string &filename)

Loads and returns the model information as a JSON string from a binary model file (.hmm or .hmms).

This API parses the specified binary model file and extracts its metadata, returning a JSON-formatted string that describes the model information (e.g., version, inputs/outputs, tensor shapes, and other attributes).

Parameters:

filename -- [in] The file name and path of the binary model file.

Returns:

Returns a JSON string containing the model information.

Note

Ensure that the binary model file is a precise binary match to the one generated after compilation. Any modification may lead to undefined behavior or incorrect metadata being reported.

static std::string tcim::Module::LoadModelInfo(const void *data, size_t size)

Loads and returns the model information as a JSON string from host memory.

This API parses the binary model content provided in host memory and extracts its metadata, returning a JSON-formatted string that describes the model information (e.g., version, inputs/outputs, tensor shapes, and other attributes).

This API can be used with confidential model data. The model bytes should be decrypted before being passed to this API to ensure correct parsing.

Parameters:
  • data -- [in] Pointer to the memory containing the binary model file.

  • size -- [in] The size of the memory in bytes.

Returns:

Returns a JSON string containing the model information.

Note

Ensure that the provided memory buffer is a precise binary match to the compiled model output. Any modification may lead to undefined behavior or incorrect metadata being reported.

Class Stream

class Stream

Represents a stream that is used by the module for inference.

Notes

  • If a Stream object is defined as a global variable, it must be explicitly destroyed before the main function exits.

  • When creating a Stream object, the actual stream is not instantiated immediately. This object is initialized only when tcim::Module::SetStream is called with the Stream object. This deferred instantiation allows for more efficient resource allocation during the model loading process.

Stream

explicit tcim::Stream::Stream(bool auto_yield = true)

Constructs a Stream object.

Parameters:

auto_yield -- [in] Specifies if to automatically release IPU core resources after all tasks in the stream are completed. Defaults to true.

tcim::Stream::Stream(const Stream&) = default

Default copy constructor for the Stream class.

Parameters:

Stream -- The Stream object to be copied.

tcim::Stream::Stream(Stream&&) = default

Default move constructor for the Stream class.

Parameters:

Stream -- The Stream object to be moved from.

GetInitStatus

Status tcim::Stream::GetInitStatus() const

Retrieves the initialization status of the Stream object and related resources.

Returns:

Returns the initial status of the Stream object.

operator=

Stream &tcim::Stream::operator=(const Stream&) = default

Default copy assignment operator for the Stream class.

Parameters:

Stream -- The Stream object to be copied.

Returns:

A reference to the Stream object assigned the values of another.

Stream &tcim::Stream::operator=(Stream&&) = default

Default move assignment operator for the Stream class.

Parameters:

Stream -- The Stream object to be moved from.

Returns:

A reference to the Stream object assigned the values of another.

Sync

Status tcim::Stream::Sync()

Waits for all tasks in the stream to complete.

Returns:

Returns the status of the function call.

Note

If the auto_yield parameter was set to true when initializing the Stream object via the constructor, the IPU resources used by this stream will be automatically released after this function completes.

SyncYield

Status tcim::Stream::SyncYield()

Releases the IPU core resources that is used by the stream after performing Stream::Sync.

Returns:

Returns the status of the function call.

Note

Before releasing the IPU resources, the Stream::Sync function is automatically invoked to ensure all operations in the stream are completed.

Class Tensor

class Tensor

Represents an input, output, or intermediate computation result.

Tensor

tcim::Tensor::Tensor()

Default constructor. The object constructed from this interface is invalid.

tcim::Tensor::Tensor(const Tensor &other) = default

Default copy constructor.

Parameters:

other -- [in] Other Tensor object.

tcim::Tensor::Tensor(const TensorInfo &info, const Buffer &buffer)

Constructs a Tensor object with TensorInfo and no data pointer.

Parameters:
  • info -- [in] The information of the tensor.

  • buffer -- [in] The buffer containing the tensor data.

tcim::Tensor::Tensor(Tensor &&other) = default

Default move constructor.

Parameters:

other -- [in] Other Tensor object.

AsFormular

Tensor tcim::Tensor::AsFormular() const

Creates a new Tensor object with normalized TensorInfo.

The returned Tensor shares the same underlying buffer but has a normalized TensorInfo (via TensorInfo::AsFormular()).

Returns:

Returns a new Tensor object with normalized info.

AsType

Tensor tcim::Tensor::AsType(DataType dtype, bool auto_cast = true) const

Creates a new tensor with the specified target data type, and optionally casts the data of original tensor with the target data type, stores casted data in the new tenor.

The new tensor is created with a contiguous memory layout in host memory. The new tensor does not contain any quantization information (QuantInfo).

Note:

If auto_cast is set to false, the new tensor is created without storing any data. To store data from original Tensor object, set auto_cast to true, which will automatically cast data to the target data type, and stores data in the new tensor.

Parameters:
  • dtype -- [in] The target data type for the new tensor, defined in DataType.

  • auto_cast -- [in] Specifies if to automatically cast the data of original Tensor object to the target data type specified in dtype. If set to true, the data will be automatically cast to the target data type with Tensor::CastTo, and stored in the new tensor. If set to false, the new tensor will be created with target data type without performing data casting. Defaults to true.

Returns:

Returns a new Tensor object with the specified data type without quantization information.

Note

This function is only valid for non-image (multi-dimensional) tensors.

Buffer

tcim::Buffer &tcim::Tensor::Buffer() const

Retrieves the buffer holding the data of the tensor.

Returns:

Returns a reference to the buffer holding the tensor data.

CastTo

Status tcim::Tensor::CastTo(Tensor &tensor) const

Casts the data of the current Tensor object to the specified data type, and stores the casted data in the given target tensor.

The supported data type conversions are as follows:

From/To

INT8, INT16, INT32

FLOAT32

FLOAT16

INT8, INT16, INT32

Not Supported

Supported

Supported

FLOAT32

Supported

NA

Supported

FLOAT16

Supported

Supported

NA

Parameters:

tensor -- [inout] The target tensor where the casted data is stored. The data type of this tensor determines the target type for the cast operation. The data of the current Tensor object is cast to the data type specified in this parameter, and the casted data is stored in this parameter.

Returns:

Returns the status of the function call.

Note

This function only supports contiguous tensors in host memory.

Clone

Tensor tcim::Tensor::Clone(bool auto_copy = true)

Creates a copy of the current Tensor object.

Parameters:
  • stream -- [in] The stream used for performing the deep copy. This is used only used if auto_copy is true.

  • auto_copy -- [in] Determines if the data stored in the buffer is copied to the new Tensor object.

    • If set to true (default), a new buffer is allocated, and both the data stored in the buffer and its associated metadata (such as size and memory type) are copied to the new buffer.

    • If set to false, a new buffer is allocated, and only the metadata (such as size and memory type) is copied to the new buffer, but its memory is uninitialized and does not contain the original data.

Returns:

Returns a new Tensor object as a copy of the current one.

CopyTo

Status tcim::Tensor::CopyTo(Tensor &dst) const

Copies the tensor data of the current Tensor object to another Tensor object.

Notes

  • Before calling this function, make sure the format, shape, and data type of source and target tensors are identical and valid.

  • This function automatically handles data alignment when required.

Parameters:

dst -- [inout] The Tensor object to copy to.

Returns:

Returns the status of the function call.

Note

Data may be modified during copying from the source to target due to differences in device type, contiguity requirements, and other data characteristics.

Data

void *tcim::Tensor::Data() const

Retrieves the pointer to the memory address where the tensor data is stored.

Returns:

Returns a pointer to the data stored in the tensor.

Device

tcim::Device tcim::Tensor::Device() const

Retrieves the device type on which the tensor data is stored.

This function returns if the tensor data is stored on host CPU or Houmo device.

Returns:

Returns the device type on which the tensor data is stored.

DeviceId

int tcim::Tensor::DeviceId() const

Retrieves the logical ID of the Houmo device on which the tensor is stored.

Returns:

The logical ID of the Houmo device on which the tensor is stored.

GetInitStatus

Status tcim::Tensor::GetInitStatus() const

Retrieves the status of the constructor function and related resources.

Returns:

Returns the status of the constructor function.

Info

const TensorInfo &tcim::Tensor::Info() const

Retrieves the information of the tensor associated with the current Tensor object.

Returns:

Returns the information of the tensor.

MemSize

size_t tcim::Tensor::MemSize() const

Retrieves the actual allocated memory size allocated for the tensor.

If the tensor is stored within a buffer, the buffer size, as returned by Buffer::Size, should be equal to or greater than the value returned by this function.

Returns:

Returns the minimum memory size required to store the tensor data.

operator!=

bool tcim::Tensor::operator!=(const Tensor &other) const noexcept

Checks if the current Tensor object is not equal to another Tensor object.

Parameters:

other -- [in] The Tensor object to compare with.

Returns:

Returns true if the two Tensor objects are not equal. Otherwise, returns false.

operator=

Tensor &tcim::Tensor::operator=(const Tensor &other) = default

Default copy assignment operator. Assigns the tensor data of a Tensor object to the current Tensor object.

Parameters:

other -- [in] The Tensor object to copy from.

Returns:

Reference to the current Tensor object.

Tensor &tcim::Tensor::operator=(Tensor &&other) = default

Default move assignment operator. Assigns the tensor data of a Tensor object to the current Tensor object.

Parameters:

other -- [in] The Tensor object to copy from.

Returns:

Reference to the current Tensor object.

operator==

bool tcim::Tensor::operator==(const Tensor &other) const noexcept

Checks if the current Tensor object is equal to another Tensor object.

Parameters:

other -- [in] The Tensor object to compare with.

Returns:

Returns true if the two TensorInfo objects are equal. Otherwise, returns false.

SelectBatch

Tensor tcim::Tensor::SelectBatch(const std::vector<int64_t> &d) const

Selects a batch of elements along the batch dimension from the original Tensor object, and returns a new Tensor object that shares the underlying memory with the original Tensor object.

The returned tensor and the original tensor use the same buffer. However, the actual memory addresses accessed vary depending on the batch selection, as different batch indices map to different memory regions within the same buffer.

Parameters:

d -- [in] A reference to a vector of indices specifying the batch elements to select.

Returns:

Returns a new Tensor object that referencing the selected batch of elements form the original tensor.

Warning

The returned tensor becomes invalid if the original tensor is destroyed. You must ensure that the original tensor remains valid for a longer duration than the returned tensor to avoid undefined behavior.

SelectROI

Tensor tcim::Tensor::SelectROI(const std::vector<int64_t> &roi_start, const std::vector<int64_t> &shape) const

Selects a Region of Interest (ROI) from the current Tensor and creates a new Tensor.

This method supports N-dimensional (ND) Tensors. The roi and shape parameters define the region to be extracted.

../../../_images/selectroi.png

Example
// Assume 'tensor' is a 3D Tensor with shape [10, 20, 30]
std::vector<int64_t> roi_start = {2, 3, 5};    // Start indices for each dimension
std::vector<int64_t> shape = {3, 7, 20}; // Shape of the ROI
//  - Dimension 1: start=2, size=3  (elements 2, 3, 4)
//  - Dimension 2: start=3, size=7  (elements 3, 4, 5, 6, 7, 8, 9)
//  - Dimension 3: start=5, size=20 (elements 5, 6, ..., 24)
Tensor roi_tensor = tensor.SelectROI(roi, shape);
// roi_tensor will have shape [3, 7, 20]

Parameters:
  • roi_start[in] -- A std::vector<int64_t> representing the starting indices of the ROI. The size of roi must match the number of dimensions of the Tensor. Each element in roi specifies the starting index for the corresponding dimension: [start_dim1, start_dim2, ..., start_dimN].

  • shape[in] -- A std::vector<int64_t> representing the shape of the ROI. The size of shape must match the number of dimensions of the Tensor. Each element in shape specifies the size (number of elements) of the ROI along the corresponding dimension: [size_dim1, size_dim2, ..., size_dimN].

Returns:

A new Tensor containing the data from the selected ROI. Returns an empty Tensor if the ROI is invalid (e.g., out of bounds). May throw an exception in some implementations for invalid ROI.

Note

This method does not modify the original Tensor. It creates a new Tensor containing the ROI data.

SplitYUV

Status tcim::Tensor::SplitYUV(Tensor &y, Tensor &uv) const

Splits a YUV-formatted tensor into separate Y and UV tensors without additional memory allocation.

This function extracts the Y (luminance) and UV (chrominance) components from a YUV tensor formatted as YUV420SP, YUV422SP, or YUV444SP. The resulting Y and UV tensors are stored as non-image format tensors and share the same underlying memory buffer as the original YUV tensor, avoiding additional memory allocation.

Parameters:
  • y -- [out] Reference to a non-image tensor that stores the Y component.

  • uv -- [out] Reference to a non-image tensor that stores the interleaved UV components.

Returns:

Returns the status of the function call.

Note

  • The split tensors share the same underlying memory as the original YUV tensor, so their lifecycle is tied to the original tensor.

  • This function can be used to split YUV tensors in both host and device memory.

ToHost

Tensor tcim::Tensor::ToHost(bool to_contiguous = false) const

Retrieves or creates a tensor in host memory with tensor data based on the current Tensor object.

The following table summarizes the behavior of this function:

  • Original Tensor Location : The memory location of the original tensor.

  • Is Contiguous : Whether the memory layout of the original tensor is contiguous.

  • to_contiguous Setting : The value of the to_contiguous parameter.

  • Returned Tensor : Describes the returned tensor after performing this function, including if the returned tensor has contiguous or non-contiguous memory layout on host.

Original Tensor Location

Is Contiguous

to_contiguous Setting

Returned Tensor

On Houmo device memory

Yes

true/false

New tensor, contiguous.

On Houmo device memory

No

true

New tensor, contiguous.

On Houmo device memory

No

false

New tensor, non-contiguous.

On Host memory

Yes

true/false

Original tensor, contiguous.

On Host memory

No

true

New tensor, contiguous.

On Host memory

No

false

Original tensor, non-contiguous.

Parameters:

to_contiguous -- [in] Specifies if to return the tensor with contiguous or non-contiguous memory layout. If set to true, the returned tensor is stored on host with contiguous in memory. If set to false, the returned tensor preserves its original memory layout. Defaults to false.

Returns:

Returns a Tensor object representing the tensor stored in host memory.

CreateDeviceTensor

static Tensor tcim::Tensor::CreateDeviceTensor(const TensorInfo &info, size_t mem_size = 0, int device_id = 0, const std::string &backend_name = "")

Creates a tensor and allocates memory on Houmo device.

Parameters:
  • info -- [in] The information of the tensor.

  • mem_size -- [in] The memory size of the tensor buffer's size auto allocated in bytes. it must equal or larger than tensor info's MemSize()

  • device_id -- [in] The logical device ID of the Houmo device on which the buffer memory is allocated. The device 0 is used by default. You can retrieve the logical device IDs via SMI tool. See "SMI Tool User Guide" for details.

  • backend_name -- [in] The backend name of the Houmo device. Only the default value can be used. You can retrieve the backend name via Module::GetBackendName.

Returns:

Returns a Tensor object created on Houmo device.

CreateHostTensor

static Tensor tcim::Tensor::CreateHostTensor(const TensorInfo &info, size_t mem_size = 0, void *ptr = nullptr)

Creates a tensor on host.

Parameters:
  • info -- [in] The information of the tensor.

  • mem_size -- [in] The memory size of the tensor buffer's size auto allocated in bytes. it must equal or larger than tensor info's MemSize()

  • ptr -- [in] Pointer to pre-allocated memory for the tensor on host. If nullptr, a new memory is allocated on host.

Returns:

Returns a Tensor object created in the host memory.

Class TensorInfo

class TensorInfo

Represents information about a tensor, including its shape, data type, memory layout, and other relevant attributes. This class provides methods to retrieve tensor information.

TensorInfo

tcim::TensorInfo::TensorInfo()

Default constructor.

tcim::TensorInfo::TensorInfo(const TensorInfo &other) = default

Default destructor.

Default copy constructor.

Parameters:

other -- [in] A TensorInfo object.

tcim::TensorInfo::TensorInfo(TensorInfo &&other) = default

Default move constructor.

Parameters:

other -- [in] A TensorInfo object.

AsContiguous

TensorInfo tcim::TensorInfo::AsContiguous() const

Creates a TensorInfo object with updated memory layout information to contiguous for the tensor.

This function generates a new TensorInfo object based on the current TensorInfo object, updating the memory layout information to indicate that the tensor is stored contiguously without strides. This is typically used for tensor stored on host.

You can create tensors on host via Tensor::CreateHostTensor with the tensor information created by this function.

Example
host_tensor = Tensor::CreateHostTensor(dev_tensor.Info().AsContiguous())

Returns:

Returns a new TensorInfo object with updated memory layout information set to contiguous.

AsFormular

TensorInfo tcim::TensorInfo::AsFormular() const

Creates a TensorInfo object with normalized shape and strides.

This function generates a new TensorInfo object where:

  • Rank0 (scalar) shapes are converted to Rank1[1].

  • Strides are populated if they were empty.

This is useful for loose shape matching where Rank0 and Rank1[1] are considered equivalent.

Returns:

Returns a new TensorInfo object with normalized.

AsType

TensorInfo tcim::TensorInfo::AsType(DataType dtype) const

Creates a TensorInfo object with updated data type based on the current TensorInfo object.

Parameters:

dtype -- [in] The new data type defined in DataType.

Returns:

Returns a new TensorInfo object with the updated data type.

Clone

TensorInfo tcim::TensorInfo::Clone() const

Creates a copy of the current TensorInfo object.

Returns:

A new TensorInfo object that is a clone of the current object.

DataType

tcim::DataType tcim::TensorInfo::DataType() const

Retrieves the data type of the tensor associated with the current TensorInfo object.

Returns:

Returns the data type of the tensor.

DataTypeSize

size_t tcim::TensorInfo::DataTypeSize() const

Retrieves the memory size (in bytes) of a single element in the tensor based on the data type of the tensor.

For example, if the data type of a tensor is FLOAT32, this function returns 4, as a FLOAT32 value typically occupies 4 bytes in memory.

Returns:

Returns the memory size (in bytes) of a single element in the tensor.

Format

tcim::DataFmt tcim::TensorInfo::Format() const

Retrieves the data format of the tensor associated with the current TensorInfo object.

Returns:

Returns the data format of the tensor.

GetInitStatus

Status tcim::TensorInfo::GetInitStatus() const

Retrieves the status of the constructor function and related resources.

Returns:

Returns the status of the constructor function.

GetQuantInfo

std::shared_ptr<QuantInfo> tcim::TensorInfo::GetQuantInfo() const

Retrieves the quantization information of the tensor associated with the current TensorInfo object.

Returns:

Returns a shared pointer to the quantization information of the tensor.

IsContiguous

bool tcim::TensorInfo::IsContiguous() const

Checks if the tensor data associated with the current TensorInfo object has a contiguous memory layout. A contiguous tensor has no strides or its strides are aligned with the shape of the tensor. For example:

strides[n] = shape[n-1] * shape[n-2] *... shape[0]

where:

  • n: The index of the tensor dimension.

  • strides[n]: The number of elements to move across dimension n.

  • shape[n]: The size of the n-th dimension of the tensor.

Returns:

Returns true if the tensor has a contiguous memory layout.

IsMatch

bool tcim::TensorInfo::IsMatch(TensorInfo &other) const

Checks if the associated tensors match for mutual copying.

The associated tensors only match if the following requirements are met:

  • The name, format, shape, and precision of source and target tensors must be identical and valid.

Parameters:

other -- [in] The TensorInfo object to compare with.

Returns:

Returns true if the associated tensors match for mutual copying. Otherwise, returns false.

MemSize

size_t tcim::TensorInfo::MemSize() const

Retrieves the memory size (in bytes) allocated for the tensor associated with the current TensorInfo object.

Note

  • For non-contiguous tensors, the returned memory size may be equal to or larger than the actual size of the tensor due to alignment requirements on Houmo device memory.

  • Only for the contiguous tensor, Size() equals to MemSize().

Returns:

Returns the memory size of the tensor in bytes.

operator!=

bool tcim::TensorInfo::operator!=(const TensorInfo &other) const noexcept

Checks if the current TensorInfo object is not equal to another TensorInfo object.

Parameters:

other -- [in] The TensorInfo object to compare with.

Returns:

Returns true if the two TensorInfo objects are not equal. Otherwise, returns false.

operator=

TensorInfo &tcim::TensorInfo::operator=(const TensorInfo &other) = default

Default copy assignment operator. Assigns the tensor information of a TensorInfo object to the current TensorInfo object.

Parameters:

other -- [in] The TensorInfo object to copy from.

Returns:

Reference to the current TensorInfo object.

TensorInfo &tcim::TensorInfo::operator=(TensorInfo &&other) = default

Default move assignment operator. Assigns the tensor information of a TensorInfo object to the current TensorInfo object.

Parameters:

other -- [in] The TensorInfo object to copy from.

Returns:

Reference to the current TensorInfo object.

operator==

bool tcim::TensorInfo::operator==(const TensorInfo &other) const noexcept

Checks if the current TensorInfo object is equal to another TensorInfo object.

Parameters:

other -- [in] The TensorInfo object to compare with.

Returns:

Returns true if the two TensorInfo objects are equal. Otherwise, returns false.

Shape

const std::vector<int64_t> &tcim::TensorInfo::Shape() const

Retrieves the shape of the tensor associated with the current TensorInfo object.

Returns:

Returns the shape of the tensor.

SpanSize

size_t tcim::TensorInfo::SpanSize() const

Retrieves the padding size in bytes for memory alignment at the end of the memory block used to store the tensor.

This function returns the size of the memory allocated at the end of the memory block to meet the memory alignments. The returned memory size is not part of the actual data size of the tensor, it is only for memory alignments.

Returns:

Returns the padding size in bytes for alignment at the end of the memory block for storing the tensor.

Stride

const std::vector<int64_t> &tcim::TensorInfo::Stride() const

Retrieves the stride of the tensor associated with the current TensorInfo object. The stride represents the number of bytes to be skipped in memory to move from one element to another along each dimension.

Returns:

Returns a reference to a vector containing the stride for each dimension of the tensor.

CreateNDInfo

static TensorInfo tcim::TensorInfo::CreateNDInfo(const std::vector<int64_t> &shape, const DataType type, const std::vector<int64_t> &stride = {}, const size_t span_size = 0)

Creates a TensorInfo object for a non-image (ND) tensor with the specified shape, data type, stride, and span size.

Parameters:
  • shape -- [in] A vector specifying the size of each dimension of the tensor.

  • type -- [in] The data type of the tensor defined in DataType.

  • stride -- [in] (Optional) A vector specifying the stride for each dimension. The stride defines the memory layout and the number of bytes to move between adjacent elements along each dimension. Defaults to an empty vector, indicating a contiguous memory layout.

  • span_size -- [in] (Optional) The number of padding in bytes used for memory alignment. Defaults to 0.

Returns:

Returns a TensorInfo object representing information about a multi-dimensional non-image tensor.

CreateYUVInfo

static TensorInfo tcim::TensorInfo::CreateYUVInfo(int64_t n, int64_t w, int64_t h, DataFmt yuv_format)

Creates a TensorInfo object for a batch of YUV images with the specified batch size, width, height, and format.

This function is used to create a TensorInfo for a batch of YUV images.

Parameters:
  • n -- [in] The number of YUV images in the batch.

  • w -- [in] The width of each YUV image.

  • h -- [in] The height of each YUV image.

  • yuv_format -- [in] The format of the YUV image defined in DataFmt.

Returns:

Returns a TensorInfo object representing information about a tensor that contains a batch of YUV images.

static TensorInfo tcim::TensorInfo::CreateYUVInfo(int64_t w, int64_t h, DataFmt yuv_format)

Creates a TensorInfo object for a YUV image with the specified image width, height, and format.

This function is used to create a TensorInfo for a single YUV image.

Parameters:
  • w -- [in] The width of the YUV image.

  • h -- [in] The height of the YUV image.

  • yuv_format -- [in] The format of the YUV image defined in DataFmt.

Returns:

Returns a TensorInfo object representing information about a image tensor.

MergeYUV

static TensorInfo tcim::TensorInfo::MergeYUV(const TensorInfo &y, const TensorInfo &uv)

Creates a TensorInfo object for a merged YUV tensor with the given tensor information of Y tensor (Y component) and UV tensor (UV component).

This function combines the information of Y and UV tensors into a single TensorInfo object that represents the information of a complete YUV image. Use this function when you have separate Y and UV tensors, and need to create a TensorInfo object for a single merged YUV tensor.

Parameters:
  • y -- [in] The TensorInfo object for the Y tensor, representing the Y component of the YUV image.

  • uv -- [in] The TensorInfo object for the UV tensor, representing the UV component of the YUV image.

Returns:

Returns a TensorInfo object representing the information the merged YUV tensor.