c/c++开发者

caffe-ssd300算法移植

BMNETC是针对caffe的模型编译器，可将某网络的caffemodel和prototxt编译成BMRuntime 所需要的文件。而且在编译的同时，支持每一层的NPU模型计算结果都会和CPU的计算结果进行对比，保证正确性。Caffe模型转换的主要操作工具介绍如下：

Command name: bmnetc - BMNet compiler command for Caffe mode

/path/to/bmnetc [--model=<path>] \
    [--weight=<path>] \
    [--shapes=<string>] \
    [--net_name=<name>] \
    [--opt=<value>] \
    [--dyn=<bool>] \
    [--outdir=<path>] \
    [--target=<name>] \
    [--cmp=<bool>] \
    [--mode=<string>] \
    [--enable_profile=<bool>] \
    [--show_args] \
    [--check_model]

args

type

Description

model

string

Necessary. Caffe prototxt path

weight

string

Necessary. Caffemodel(weight) path

shapes

string

Optional. Shapes of all inputs, default use the shape in prototxt, format [x,x,x,x],[x,x]…, these correspond to inputs one by one in sequence

net_name

string

Optional. Name of the network, default use the name in prototxt

opt

int

Optional. Optimization level. Option: 0, 1, 2, default 2.

dyn

bool

Optional. Use dynamic compilation, default false.

outdir

string

Necessary. Output directory

target

string

Necessary. Option: BM1682, BM1684; default: BM1682

cmp

bool

Optional.Check result during compilation. Default: true

mode

string

Optional. Set bmnetc mode. Option: compile, GenUmodel. Default: compile.

enable_profile

bool

Optional. Enable profile log. Default: false

show_args

Optional. Display arguments passed to bmnetc compiler.

check_model

Optional. Check unsupported layer types from input model.

下载SSD模型

** 从公网下载模型，或者使用自己训练好的模型 **
SSD模型：https://docs.google.com/uc?export=download&id=0BzKzrI_SkD1_WVVTSmQxU0dVRzA
** Sophon将从公网获取caffe-ssd模型做成了脚本 **
# SSD_object/model# ./download_model.sh

使用Sophon complier编译模型

** Sophon将编译caffe-ssd模型做成了脚本 **
SSD_object/model# ./generate_verify_bmodel.sh
** 该脚本的主要内容 **
bmnetc --model=${model_dir}/ssd300_deploy.prototxt \
       --weight=${model_dir}/ssd300.caffemodel \
       --shapes=[1,3,300,300] \
       --net_name="ssd300-caffe"  \
       --outdir=./out/ssd300 \
       --target=BM1682
** 执行完脚本后正常的输出结果如下 **
============================================================
*** Store bmodel of BMCompiler...
============================================================
I1018 13:52:19.306128   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [data]
I1018 13:52:19.306155   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [detection_out]
I1018 13:52:19.884819   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [data]
I1018 13:52:19.884840   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_loc]
I1018 13:52:19.884846   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_priorbox]
I1018 13:52:19.884850   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_conf_flatten]
I1018 13:52:19.884862   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_loc]
I1018 13:52:19.884869   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_conf_flatten]
I1018 13:52:19.884872   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [mbox_priorbox]
I1018 13:52:19.884876   211 bmcompiler_bmodel.cpp:123] [BMCompiler:I] save_tensor inout name [detection_out]
=====do check new Bmodel=====
I1018 13:52:20.933038   218 bmrt_test.cpp:829] [BMRT_TEST:I] Loop num: 1
mult engine c_model init
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
I1018 13:52:20.937220   218 bmruntime_bmodel.cpp:683] [BMRuntime:I] Loading bmodel from [./out/ssd300//compilation.bmodel]. Thanks for your patience...
I1018 13:52:20.937304   218 bmruntime_bmodel.cpp:665] [BMRuntime:I] pre net num: 0, load net num: 1
I1018 13:52:20.966521   218 bmrt_test.cpp:458] [BMRT_TEST:I] ==> running network #0, name: ssd300-caffe, loop: 0
INFO Couldn't find any detections
I1018 13:52:39.612751   218 bmrt_test.cpp:561] [BMRT_TEST:I] the last_api_process_time_us is 0 us
I1018 13:52:39.612772   218 bmrt_test.cpp:603] [BMRT_TEST:I] +++ The network[ssd300-caffe] stage[0] cmp success +++
I1018 13:52:39.612779   218 bmrt_test.cpp:614] [BMRT_TEST:I] load input time(s): 0.000368
I1018 13:52:39.612798   218 bmrt_test.cpp:615] [BMRT_TEST:I] calculate  time(s): 18.6458
I1018 13:52:39.612803   218 bmrt_test.cpp:616] [BMRT_TEST:I] get output time(s): 4e-06
I1018 13:52:39.612807   218 bmrt_test.cpp:617] [BMRT_TEST:I] compare    time(s): 4.5e-05
** 结果判断 ** 
** 有“+++ The network[ssd300-caffe] stage[0] cmp success +++”的提示，**
** 则模型编译流程正确，与原生模型的精度一致。**
** 编译生成的模型存放在如下目录 **
├── out
│ ├── ssd300
│ │ ├── compilation.bmodel
│ │ ├── input_ref_data.dat
│ │ └── output_ref_data.dat

加载Sophon bmodel

/** create device handle **/
bm_status_t status = bm_dev_request(&g_bm_handle_, 0);
/** create inference runtime handle **/
g_p_bmrt_ = bmrt_create(g_bm_handle_); 
/** load bmodel by file **/
bool flag = bmrt_load_bmodel(g_p_bmrt_, bmodel_file.c_str());
...
...
/** input outpur tensor (device memory <-> system memory))**/
auto net_info = bmrt_get_network_info(p_bmrt_, net_names_[0]);
/** define the input/output tensor of net with shape fixed shape **/
bmrt_tensor(&input_tensor_, p_bmrt_, net_info->input_dtypes[0], input_shape);
bmrt_tensor(&output_tensor_, p_bmrt_, net_info->output_dtypes[0], output_shape);
/** mmap device memory to system memory **/
status = bm_mem_mmap_device_mem(bm_handle_, &input_tensor_.device_mem, (unsigned long long*)&input_);
status = bm_mem_mmap_device_mem(bm_handle_, &output_tensor_.device_mem, (unsigned long long*)&output_);

数据预处理

...
/** Sophon支持Opencv读取图形文件 **/
cv::Mat img = cv::imread(input_url);
...
/** set mean **/
std::vector<float> mean_values;
  mean_values.push_back(123);
  mean_values.push_back(117);
  mean_values.push_back(104);
/** resize to 300 * 300 **/
cv::Mat sample_resized(input_geometry_.height, input_geometry_.width, CV_8UC3);
cv::resize(sample, sample_resized, input_geometry_);
/** uint8 to float **/
sample_resized.convertTo(sample_float, CV_32FC3);
/** sample normalized **/
cv::subtract(sample_float, mean_, sample_normalized);
cv::split(sample_normalized, *input_channels);

执行Sophon Model推理

/** flush inference **/
status = bm_mem_flush_device_mem(bm_handle_, &input_tensor_.device_mem);
/** ssd inference **/
bool ret = bmrt_launch_tensor_ex(p_bmrt_, net_names_[0], &input_tensor_,
                                   1, &output_tensor_, 1, true, false);
/** sync, wait for finishing inference **/
status = bm_thread_sync(bm_handle_);
/** free device memory **/                        
status = bm_mem_invalidate_device_mem(bm_handle_,
                                        &output_tensor_.device_mem);

数据后处理

/** get the number of the elements with shape **/
int output_count = bmrt_shape_count(&output_shape);
/** parse the output tensor with elements**/
for (int i = 0; i < output_count; i += 7) {
    ...
    /** output_ is a system global memory for output tensor**/
    float *proposal = &output_[i];
    detection.class_id = proposal[1];
    detection.score = proposal[2];
   detection.x1 = proposal[3] * image.cols;
   detection.y1 = proposal[4] * image.rows;
   detection.x2 = proposal[5] * image.cols;
   detection.y2 = proposal[6] * image.rows;
}

实例演示

$ cd bmnnsdk2-bm1682_v1.1.4
$ ./docker_run_bmnnsdk.sh 
# cd scripts 
# ./intall_lib.sh nntc
# source envsetup_cmodel.sh
# cd ../examples/SSD_object/cpp/
# make -f Makefile.arm clean && make -f Makefile.arm
# make -f Makefile.arm install
** 将install目录复制到SOC单板上 **
**"YOUR_SOC_IP"字符串替换为实际的soc单板ip地址**
# scp -r /workspace/install linaro@YOUR_SOC_IP:~/
# exit
$ ssh linaro@YOUR_SOC_IP
** 安装 SE3 driver **
$ sudo /system/data/chdriver.sh
$ cd install
$ ./bin/ssd300_object.arm image res/ssd300/vehicle_1.jpg model/ssd300 1 1
bmcpu init: skip cpu_user_defined
open usercpu.so, init user_cpu_init
> get model ssd300-caffe successfully
...
thread-0 <cpu-1> |           43 (us) >> [0]lifecycle
thread-0 <cpu-1> |          125 (us) >> [0]ssd overall
thread-0 <cpu-1> |          126 (us) >> [0]read image
thread-0 <cpu-1> |        13222 (us) << [0]read image
thread-0 <cpu-1> |        13291 (us) >> [0]detection
thread-0 <cpu-1> |        13300 (us) >> [0]ssd pre-process
thread-0 <cpu-1> |        25845 (us) << [0]ssd pre-process
thread-0 <cpu-1> |        25850 (us) >> [0]flush inference
thread-0 <cpu-1> |        25958 (us) << [0]flush inference
thread-0 <cpu-1> |        25959 (us) >> [0]ssd inference
thread-0 <cpu-1> |        67027 (us) << [0]ssd inference
thread-0 <cpu-1> |        67042 (us) >> [0]ssd post-process
thread-0 <cpu-1> |        67054 (us) << [0]ssd post-process
thread-0 <cpu-1> |        67055 (us) << [0]detection
thread-0 <cpu-1> |        82971 (us) << [0]ssd overall
thread-0 <cpu-1> |        83061 (us) << [0]lifecycle

############################
DURATIONS: main process
############################
[          total time] - iteration <0> : 83572 us
** 执行完成后，会在当前目录下生成 detect image **
$ ls
bin  model  models  out-t0_0_vehicle_1.jpg  res

Previous启动开发环境Sophon docker Nextpython开发者

Last updated 5 years ago

Was this helpful?