c/c++开发者
caffe-ssd300算法移植
BMNETC是针对caffe的模型编译器,可将某网络的caffemodel和prototxt编译成BMRuntime 所需要的文件。而且在编译的同时,支持每一层的NPU模型计算结果都会和CPU的计算结果进行对比,保证正确性。Caffe模型转换的主要操作工具介绍如下:
Command name: bmnetc - BMNet compiler command for Caffe mode
/path/to/bmnetc [--model=<path>] \
[--weight=<path>] \
[--shapes=<string>] \
[--net_name=<name>] \
[--opt=<value>] \
[--dyn=<bool>] \
[--outdir=<path>] \
[--target=<name>] \
[--cmp=<bool>] \
[--mode=<string>] \
[--enable_profile=<bool>] \
[--show_args] \
[--check_model]
args
type
Description
model
string
Necessary. Caffe prototxt path
weight
string
Necessary. Caffemodel(weight) path
shapes
string
Optional. Shapes of all inputs, default use the shape in prototxt, format [x,x,x,x],[x,x]…, these correspond to inputs one by one in sequence
net_name
string
Optional. Name of the network, default use the name in prototxt
opt
int
Optional. Optimization level. Option: 0, 1, 2, default 2.
dyn
bool
Optional. Use dynamic compilation, default false.
outdir
string
Necessary. Output directory
target
string
Necessary. Option: BM1682, BM1684; default: BM1682
cmp
bool
Optional.Check result during compilation. Default: true
mode
string
Optional. Set bmnetc mode. Option: compile, GenUmodel. Default: compile.
enable_profile
bool
Optional. Enable profile log. Default: false
show_args
Optional. Display arguments passed to bmnetc compiler.
check_model
Optional. Check unsupported layer types from input model.
下载SSD模型
** 从公网下载模型,或者使用自己训练好的模型 **
SSD模型:https://docs.google.com/uc?export=download&id=0BzKzrI_SkD1_WVVTSmQxU0dVRzA
** Sophon将从公网获取caffe-ssd模型做成了脚本 **
# SSD_object/model# ./download_model.sh
使用Sophon complier编译模型
** Sophon将编译caffe-ssd模型做成了脚本 **
SSD_object/model# ./generate_verify_bmodel.sh
** 该脚本的主要内容 **
bmnetc --model=${model_dir}/ssd300_deploy.prototxt \
--weight=${model_dir}/ssd300.caffemodel \
--shapes=[1,3,300,300] \
--net_name="ssd300-caffe" \
--outdir=./out/ssd300 \
--target=BM1682
** 执行完脚本后正常的输出结果如下 **
============================================================
*** Store bmodel of BMCompiler...
============================================================
I1018 13:52:19.306128 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [data]
I1018 13:52:19.306155 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [detection_out]
I1018 13:52:19.884819 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [data]
I1018 13:52:19.884840 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_loc]
I1018 13:52:19.884846 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_priorbox]
I1018 13:52:19.884850 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_conf_flatten]
I1018 13:52:19.884862 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_loc]
I1018 13:52:19.884869 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_conf_flatten]
I1018 13:52:19.884872 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_priorbox]
I1018 13:52:19.884876 211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [detection_out]
=====do check new Bmodel=====
I1018 13:52:20.933038 218 bmrt_test.cpp:829] [BMRT_TEST:I] Loop num: 1
mult engine c_model init
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
I1018 13:52:20.937220 218 bmruntime_bmodel.cpp:683] [BMRuntime:I] Loading bmodel from [./out/ssd300//compilation.bmodel]. Thanks for your patience...
I1018 13:52:20.937304 218 bmruntime_bmodel.cpp:665] [BMRuntime:I] pre net num: 0, load net num: 1
I1018 13:52:20.966521 218 bmrt_test.cpp:458] [BMRT_TEST:I] ==> running network #0, name: ssd300-caffe, loop: 0
INFO Couldn't find any detections
I1018 13:52:39.612751 218 bmrt_test.cpp:561] [BMRT_TEST:I] the last_api_process_time_us is 0 us
I1018 13:52:39.612772 218 bmrt_test.cpp:603] [BMRT_TEST:I] +++ The network[ssd300-caffe] stage[0] cmp success +++
I1018 13:52:39.612779 218 bmrt_test.cpp:614] [BMRT_TEST:I] load input time(s): 0.000368
I1018 13:52:39.612798 218 bmrt_test.cpp:615] [BMRT_TEST:I] calculate time(s): 18.6458
I1018 13:52:39.612803 218 bmrt_test.cpp:616] [BMRT_TEST:I] get output time(s): 4e-06
I1018 13:52:39.612807 218 bmrt_test.cpp:617] [BMRT_TEST:I] compare time(s): 4.5e-05
** 结果判断 **
** 有“+++ The network[ssd300-caffe] stage[0] cmp success +++”的提示,**
** 则模型编译流程正确,与原生模型的精度一致。**
** 编译生成的模型存放在如下目录 **
├── out
│ ├── ssd300
│ │ ├── compilation.bmodel
│ │ ├── input_ref_data.dat
│ │ └── output_ref_data.dat
加载Sophon bmodel
/** create device handle **/
bm_status_t status = bm_dev_request(&g_bm_handle_, 0);
/** create inference runtime handle **/
g_p_bmrt_ = bmrt_create(g_bm_handle_);
/** load bmodel by file **/
bool flag = bmrt_load_bmodel(g_p_bmrt_, bmodel_file.c_str());
...
...
/** input outpur tensor (device memory <-> system memory))**/
auto net_info = bmrt_get_network_info(p_bmrt_, net_names_[0]);
/** define the input/output tensor of net with shape fixed shape **/
bmrt_tensor(&input_tensor_, p_bmrt_, net_info->input_dtypes[0], input_shape);
bmrt_tensor(&output_tensor_, p_bmrt_, net_info->output_dtypes[0], output_shape);
/** mmap device memory to system memory **/
status = bm_mem_mmap_device_mem(bm_handle_, &input_tensor_.device_mem, (unsigned long long*)&input_);
status = bm_mem_mmap_device_mem(bm_handle_, &output_tensor_.device_mem, (unsigned long long*)&output_);
数据预处理
...
/** Sophon支持Opencv读取图形文件 **/
cv::Mat img = cv::imread(input_url);
...
/** set mean **/
std::vector<float> mean_values;
mean_values.push_back(123);
mean_values.push_back(117);
mean_values.push_back(104);
/** resize to 300 * 300 **/
cv::Mat sample_resized(input_geometry_.height, input_geometry_.width, CV_8UC3);
cv::resize(sample, sample_resized, input_geometry_);
/** uint8 to float **/
sample_resized.convertTo(sample_float, CV_32FC3);
/** sample normalized **/
cv::subtract(sample_float, mean_, sample_normalized);
cv::split(sample_normalized, *input_channels);
执行Sophon Model推理
/** flush inference **/
status = bm_mem_flush_device_mem(bm_handle_, &input_tensor_.device_mem);
/** ssd inference **/
bool ret = bmrt_launch_tensor_ex(p_bmrt_, net_names_[0], &input_tensor_,
1, &output_tensor_, 1, true, false);
/** sync, wait for finishing inference **/
status = bm_thread_sync(bm_handle_);
/** free device memory **/
status = bm_mem_invalidate_device_mem(bm_handle_,
&output_tensor_.device_mem);
数据后处理
/** get the number of the elements with shape **/
int output_count = bmrt_shape_count(&output_shape);
/** parse the output tensor with elements**/
for (int i = 0; i < output_count; i += 7) {
...
/** output_ is a system global memory for output tensor**/
float *proposal = &output_[i];
detection.class_id = proposal[1];
detection.score = proposal[2];
detection.x1 = proposal[3] * image.cols;
detection.y1 = proposal[4] * image.rows;
detection.x2 = proposal[5] * image.cols;
detection.y2 = proposal[6] * image.rows;
}
实例演示
$ cd bmnnsdk2-bm1682_v1.1.4
$ ./docker_run_bmnnsdk.sh
# cd scripts
# ./intall_lib.sh nntc
# source envsetup_cmodel.sh
# cd ../examples/SSD_object/cpp/
# make -f Makefile.arm clean && make -f Makefile.arm
# make -f Makefile.arm install
** 将install目录复制到SOC单板上 **
**"YOUR_SOC_IP"字符串替换为实际的soc单板ip地址**
# scp -r /workspace/install linaro@YOUR_SOC_IP:~/
# exit
$ ssh linaro@YOUR_SOC_IP
** 安装 SE3 driver **
$ sudo /system/data/chdriver.sh
$ cd install
$ ./bin/ssd300_object.arm image res/ssd300/vehicle_1.jpg model/ssd300 1 1
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
> get model ssd300-caffe successfully
...
thread-0 <cpu-1> | 43 (us) >> [0]lifecycle
thread-0 <cpu-1> | 125 (us) >> [0]ssd overall
thread-0 <cpu-1> | 126 (us) >> [0]read image
thread-0 <cpu-1> | 13222 (us) << [0]read image
thread-0 <cpu-1> | 13291 (us) >> [0]detection
thread-0 <cpu-1> | 13300 (us) >> [0]ssd pre-process
thread-0 <cpu-1> | 25845 (us) << [0]ssd pre-process
thread-0 <cpu-1> | 25850 (us) >> [0]flush inference
thread-0 <cpu-1> | 25958 (us) << [0]flush inference
thread-0 <cpu-1> | 25959 (us) >> [0]ssd inference
thread-0 <cpu-1> | 67027 (us) << [0]ssd inference
thread-0 <cpu-1> | 67042 (us) >> [0]ssd post-process
thread-0 <cpu-1> | 67054 (us) << [0]ssd post-process
thread-0 <cpu-1> | 67055 (us) << [0]detection
thread-0 <cpu-1> | 82971 (us) << [0]ssd overall
thread-0 <cpu-1> | 83061 (us) << [0]lifecycle
############################
DURATIONS: main process
############################
[ total time] - iteration <0> : 83572 us
** 执行完成后,会在当前目录下生成 detect image **
$ ls
bin model models out-t0_0_vehicle_1.jpg res
Last updated
Was this helpful?