模型推理
c接口详细介绍请阅读NNToolChain.pdf 和 BMLib_User_Guide.pdf 。
python接口详细介绍请阅读Sophon_Inference_zh.pdf
BMRuntime用于读取BMCompiler的编译输出(.bmodel),驱动其在BITMAIN TPU芯片中执行。BMRuntime向用户提供了丰富的接口,便于用户移植算法,其软件架构如下:

Bmruntime实现了c/c++/python,本章主要介绍c和python常用接口,本章主要介绍如下内容:
BMLIB 接口:用于设备管理
BMruntime的c语言接口
BMLIB和BMruntime的python接口介绍
1. BMLib模块c接口介绍
BMLIB接口
用于设备管理,不属于BMRuntime,但需要配合使用,所以先介绍
BMLIB接口是C语言接口,对应的头文件是bmlib_runtime.h,对应的lib库为libbmlib.so,
BMLIB接口用于设备管理,包括设备内存的管理。
BMLIB的接口很多,这里介绍应用程序通常需要用到的接口。
bm_dev_request
用于请求一个设备,得到设备句柄handle。其他设备接口,都需要指定这个设备句柄。
其中devid表示设备号,在PCIE模式下,存在多个设备时可以用于选择对应的设备;在SoC模式下,请指定为0。
/** * @name bm_dev_request * @brief To create a handle for the given device * @ingroup bmlib_runtime * * @param [out] handle The created handle * @param [in] devid Specify on which device to create handle * @retval BM_SUCCESS Succeeds. * Other code Fails. */ bm_status_t bm_dev_request(bm_handle_t *handle, int devid);
bm_dev_free
用于释放一个设备。通常应用程序开始需要请求一个设备,退出前释放这个设备。
/** * @name bm_dev_free * @brief To free a handle * @param [in] handle The handle to free */ void bm_dev_free(bm_handle_t handle);
2. BMRuntime模块c接口介绍
对应的头文件为bmruntime_interface.h,对应的lib库为libbmrt.so。
用户程序使用C接口时建议使用该接口,该接口支持多种shape的静态编译网络,支持动态编译网络。
bmrt_create
/** * @name bmrt_create * @brief To create the bmruntime with bm_handle. * This API creates the bmruntime. It returns a void* pointer which is the pointer * of bmruntime. Device id is set when get bm_handle; * @param [in] bm_handle bm handle. It must be initialized by using bmlib. * @retval void* the pointer of bmruntime */ void* bmrt_create(bm_handle_t bm_handle);
bmrt_destroy
/** * @name bmrt_destroy * @brief To destroy the bmruntime pointer * @ingroup bmruntime * This API destroy the bmruntime. * @param [in] p_bmrt Bmruntime that had been created */ void bmrt_destroy(void* p_bmrt);
bmrt_load_bmodel
加载bmodel文件,加载后bmruntime中就会存在若干网络的数据,后续可以对网络进行推理。
/** * @name bmrt_load_bmodel * @brief To load the bmodel which is created by BM compiler * This API is to load bmodel created by BM compiler. * After loading bmodel, we can run the inference of neuron network. * @param [in] p_bmrt Bmruntime that had been created * @param [in] bmodel_path Bmodel file directory. * @retval true Load context sucess. * @retval false Load context failed. */ bool bmrt_load_bmodel(void* p_bmrt, const char *bmodel_path);
bmrt_load_bmodel_data
加载bmodel,不同于bmrt_load_bmodel,它的bmodel数据存在内存中
/* Parameters: [in] p_bmrt - Bmruntime that had been created. [in] bmodel_data - Bmodel data pointer to buffer. [in] size - Bmodel data size. Returns: bool - true: success; false: failed. */ bool bmrt_load_bmodel_data(void* p_bmrt, const void * bmodel_data, size_t size);
bmrt_get_network_info
bmrt_get_network_info根据网络名,得到某个网络的信息
/* bm_stage_info_t holds input shapes and output shapes; every network can contain one or more stages */ typedef struct { bm_shape_t* input_shapes; /* input_shapes[0] / [1] / ... / [input_num-1] */ bm_shape_t* output_shapes; /* output_shapes[0] / [1] / ... / [output_num-1] */ } bm_stage_info_t; /* bm_tensor_info_t holds all information of one net */ typedef struct { const char* name; /* net name */ bool is_dynamic; /* dynamic or static */ int input_num; /* number of inputs */ char const** input_names; /* input_names[0] / [1] / .../ [input_num-1] */ bm_data_type_t* input_dtypes; /* input_dtypes[0] / [1] / .../ [input_num-1] */ float* input_scales; /* input_scales[0] / [1] / .../ [input_num-1] */ int output_num; /* number of outputs */ char const** output_names; /* output_names[0] / [1] / .../ [output_num-1] */ bm_data_type_t* output_dtypes; /* output_dtypes[0] / [1] / .../ [output_num-1] */ float* output_scales; /* output_scales[0] / [1] / .../ [output_num-1] */ int stage_num; /* number of stages */ bm_stage_info_t* stages; /* stages[0] / [1] / ... / [stage_num-1] */ } bm_net_info_t;
bm_net_info_t表示一个网络的全部信息,bm_stage_info_t表示该网络支持的不同的shape情况。
/** * @name bmrt_get_network_info * @brief To get network info by net name * @param [in] p_bmrt Bmruntime that had been created * @param [in] net_name Network name * @retval bm_net_info_t* Pointer to net info, needn't free by user; if net name not found, will return NULL. */ const bm_net_info_t* bmrt_get_network_info(void* p_bmrt, const char* net_name);
示例代码:
const char *model_name = "VGG_VOC0712_SSD_300X300_deploy"
const char **net_names = NULL;
bm_handle_t bm_handle;
bm_dev_request(&bm_handle, 0);
void * p_bmrt = bmrt_create(bm_handle);
bool ret = bmrt_load_bmodel(p_bmrt, bmodel.c_str());
std::string bmodel; //bmodel file
int net_num = bmrt_get_network_number(p_bmrt, model_name);
bmrt_get_network_names(p_bmrt, &net_names);
for (int i=0; i<net_num; i++) {
//do somthing here
......
}
free(net_names);
bmrt_destroy(p_bmrt);
bm_dev_free(bm_handle);
bmrt_shape_count
接口声明如下:
/* number of shape elements, shape should not be NULL and num_dims should not large than BM_MAX_DIMS_NUM */ uint64_t bmrt_shape_count(const bm_shape_t* shape);
可以得到shape的元素个数。
比如num_dims为4,则得到的个数为dims[0]*dims[1]*dims[2]*dims[3]
bm_shape_t 结构介绍:
typedef struct { int num_dims; int dims[BM_MAX_DIMS_NUM]; } bm_shape_t;
bm_shape_t表示tensor的shape,目前最大支持8维的tensor。其中num_dims为tensor的实际维度数,dims为各维度值,dims的各维度值从[0]开始,比如(n, c, h, w)四维分别对应(dims[0], dims[1], dims[2], dims[3])。
如果是常量shape,初始化参考如下:
bm_shape_t shape = {4, {4,3,228,228}}; bm_shape_t shape_array[2] = { {4, {4,3,28,28}}, // [0] {2, {2,4}}, // [1] }
bm_image_from_mat
if use this function you need to open USE_OPENCV macro in include/bmruntime/bm_wrapper.hpp /** * @name bm_image_from_mat * @brief Convert opencv Mat object to BMCV bm_image object * @param [in] in OPENCV mat object * @param [out] out BMCV bm_image object * @retval true Launch success. * @retval false Launch failed. */ static inline bool bm_image_from_mat (cv::Mat &in, bm_image &out)
* @brief Convert opencv multi Mat object to multi BMCV bm_image object static inline bool bm_image_from_mat (std::vector<cv::Mat> &in, std::vector<bm_image> &out)
实例代码请参考bmmnsdk2开发包中examples/SSD_object/cpp_cv_bmcv_bmrt
bm_image_from_frame
/** * @name bm_image_from_frame * @brief Convert ffmpeg a avframe object to a BMCV bm_image object * @ingroup bmruntime * * @param [in] bm_handle the low level device handle * @param [in] in a read-only avframe * @param [out] out an uninitialized BMCV bm_image object use bm_image_destroy function to free out parameter until you no longer useing it. * @retval true change success. * @retval false change failed. */ static inline bool bm_image_from_frame (bm_handle_t &bm_handle, AVFrame &in, bm_image &out)
/** * @name bm_image_from_frame * @brief Convert ffmpeg avframe to BMCV bm_image object * @ingroup bmruntime * * @param [in] bm_handle the low level device handle * @param [in] in a read-only ffmpeg avframe vector * @param [out] out an uninitialized BMCV bm_image vector use bm_image_destroy function to free out parameter until you no longer useing it. * @retval true change success. * @retval false chaneg failed. */ static inline bool bm_image_from_frame (bm_handle_t &bm_handle, std::vector<AVFrame> &in, std::vector<bm_image> &out)
实例代码请参考bmmnsdk2开发包中examples/SSD_object/cpp_ffmpeg_bmcv_bmrt/main.cpp
bm_inference
if use this function you need to open USE_OPENCV macro in include/bmruntime/bm_wrapper.hpp /** * @name bm_inference * @brief A block inference wrapper call * @ingroup bmruntime * * This API supports the neuron nework that is static-compiled or dynamic-compiled * After calling this API, inference on TPU is launched. And the CPU * program will be blocked. * This API support single input && single output, and multi thread safety * * @param [in] p_bmrt Bmruntime that had been created * @param [in] input bm_image of single-input data * @param [in] output Pointer of single-output buffer * @param [in] net_name The name of the neuron network * @param [in] input_shape single-input shape * * @retval true Launch success. * @retval false Launch failed. */ static inline bool bm_inference (void *p_bmrt, bm_image *input, void *output, bm_shape_t input_shape, const char *net_name)
* This API support single input && multi output, and multi thread safety static inline bool bm_inference (void *p_bmrt, bm_image *input, std::vector<void*> outputs, bm_shape_t input_shape, const char *net_name)
* This API support multiple inputs && multiple outputs, and multi thread safety static inline bool bm_inference (void *p_bmrt, std::vector<bm_image*> inputs, std::vector<void*> outputs, std::vector<bm_shape_t> input_shapes, const char *net_name)
实例代码请参考bmmnsdk2开发包中examples/SSD_object/cpp_cv_bmcv_bmrt/main.cpp
3. Python接口
本章节只介绍了用例py_ffmpeg_bmcv_sail中用的的接口函数
更多接口定义请查阅Sophon_Inference_zh.pdf
Engine
def __init__(tpu_id): """ Constructor does not load bmodel. Parameters --------- tpu_id : int TPU ID. You can use bm-smi to see available IDs """
load
def load(bmodel_path): Load bmodel from file. Parameters --------- bmodel_path : str Path to bmode """
set_io_mode
def set_io_mode(mode): """ Set IOMode for a graph. Parameters --------- mode : sail.IOMode Specified io mode """
get_graph_names
def get_graph_names(): """ Get all graph names in the loaded bmodels. Returns ------ graph_names : list Graph names list in loaded context """
get_input_names
def get_input_names(graph_name): """ Get all input tensor names of the specified graph. Parameters --------- graph_name : str Specified graph name Returns ------ input_names : list All the input tensor names of the graph """
get_output_names
def get_output_names(graph_name): """ Get all output tensor names of the specified graph. Parameters --------- graph_name : str Specified graph name Returns ------ input_names : list All the output tensor names of the graph """
sail.IOMode
# Input tensors are in system memory while output tensors are in device memory sail.IOMode.SYSI # Input tensors are in device memory while output tensors are in system memory. sail.IOMode.SYSO # Both input and output tensors are in system memory. sail.IOMode.SYSIO # Both input and output tensors are in device memory. ail.IOMode.DEVIO
set_io_mode
def set_io_mode(mode): """ Set IOMode for a graph. Parameters --------- mode : sail.IOMode Specified io mode """
sail.Tensor
def __init__(handle, shape, dtype, own_sys_data, own_dev_data): """ Constructor allocates system memory and device memory of the tensor. Parameters --------- handle : sail.Handle Handle instance shape : tuple Tensor shape dytpe : sail.Dtype Data type own_sys_data : bool Indicator of whether own system memory own_dev_data : bool Indicator of whether own device memory """
get_input_dtype
def get_input_dtype(graph_name, tensor_name): """ Get scale of an input tensor. Only used for int8 models. Parameters --------- graph_name : str The specified graph name tensor_name : str The specified output tensor name Returns ------ scale: sail.Dtype Data type of the input tensor """
get_output_dtype
def get_output_dtype(graph_name, tensor_name): """ Get the shape of an output tensor in a graph. Parameters --------- graph_name : str The specified graph name tensor_name : str The specified output tensor name Returns ------ tensor_shape : list The shape of the tensor """
process
def process(graph_name, input_tensors, output_tensors): """ Inference with provided input and output tensors. Parameters --------- graph_name : str The specified graph name input_tensors : dict {str : sail.Tensor} Input tensors managed by user output_tensors : dict {str : sail.Tensor} Output tensors managed by user """
get_input_scale
def get_input_scale(graph_name, tensor_name): """ Get scale of an input tensor. Only used for int8 models. Parameters --------- graph_name : str The specified graph name tensor_name : str The specified output tensor name Returns ------ scale: float32 Scale of the input tensor """
Last updated
Was this helpful?