INT8 Bmodel转化

在BM1684算力板上，目前已经支持int8模型的部署。在通用流程中，需要先借助于bitmain提供的量化工具对fp32模型进行量化。 Qantization-Tools是比特大陆自主开发的网络模型量化工具，它解析各种已训练好的32bit浮点网络模型，生成8bit的定点网络模型。该8bit定点网络模型，可用于比特大陆SOPHON系列AI运算平台。在SOPHON运算平台上，网络各层输入、输出、系数都用8bit来表示，从而在保证网络精度的基础上，大幅减少功耗，内存，传输延迟，大幅提高运算速度。

Quantization-Tools工具架构如下:

1. FP32 Umodel转化

基于开源框架(caffe, tensorflow,pytorch,mxnet)的模型首先需要借助于量化工具转换成为fp32umodel，成为bitmain量化平台私有的格式。基于fp32umodel,后续量化流程已经跟开源框架解耦，作为通用流程执行量化校准。 Bitmain量化平台框架参考caffe框架，因此天然支持caffemodel，在caffemodel无需借助工具，可以直接作为int8校准的输入即可。但是tensorflow、pytorch、mxnet等必须首先转换为fp32umodel.

1.1 caffe模型转换fp32umodel

原生支持caffemodel，可以直接把.caffemodel当做.fp32umodel作为输入。

1.2 tensorflow模型转换fp32umodel

在BM1684平台，支持TF int8模型的部署运行，需要对TF标准模型模型(*.pb)进行量化操作，首先都需要转换为fp32umodel，之后就使用标准量化流程。如下介绍如何使用Quantization-tools提供的工具把TF模型转换成fp32umodel。

量化工具包提供了名为：pb_to_umodel的python工具例如在pb_to_fp32umodel的示例中，演示如何把TF resnet50_v2.pb转换为fp32umodel:

 13 import pb_to_umodel
 14 
 15 tf_resnet50 = [
 16     '-m', './models/frozen_resnet_v2_50.pb',
 17     '-i', 'input:0',
 18     '-o', 'resnet_v2_50/predictions/Softmax:0',
 19     '-s', '(1, 299, 299, 3)',
 20     '-d', 'compilation',
 21     '-n', 'resnet50_v2',
 22     '-p', 'INCEPTION',
 23     '-D', '../classify_demo/lmdb/imagenet_s/ilsvrc12_val_lmdb',
 24     '-a',
 25     '-t'
 26 ]
 27 
 28 if __name__ == '__main__':
 29     args = pb_to_umodel.parse_args().parse_args(tf_resnet50)
 30     pb_to_umodel.convert_pb_to_umodelv2(args)

在docker环境下，直接执行：

python3 resnet50_v2_to_umodel.py

在当前文件夹下,新生成 compilation 文件夹,存放新生成的.fp32umodel 与.prototxt

pb_to_umodel的命令参数：

args

Description

-m

指向*.pb文件的路径

-i

输入tensor的名称

-o

输出tensor的名称

-s

输入tensor的维度,(N,H,W,C)

-d

输出文件夹的名字

-n

网络的名字

-p

数据预处理类型,预先定义了VGG,INCEPTION,SSD_V,SSD_I几种,没有合适的随意选一个,然后在手动编辑prototxt文件的时候,根据实际的预处理来添加

-D

lmdb数据集的位置,没有的话,可以暂时随意填个路径,然后在手动编辑prototxt文件的时候,根据实际的路径来添加

-a

加上该参数,会在生成的模型中添加top1,top5两个accuracy层

-t

固定参数

详细文件可以参考pb_to_fp32umodel的示例.

1.3 pytorch模型转换fp32umodel

暂未发布

1.4 mxnet模型转换fp32umodel

暂未发布

2. FP32 Umodel转换INT8 Umodel转化

此流程从fp32umodel转换为int8umodel，即为量化过程。

基于fp32umodel网络模型进行int8的量化，主要骤如下：

2.1 准备lmdb数据集

2.2 生成int8umodel

2.3 精度测试（optional）

2.1 准备lmdb数据集

原始数据集合转换成lmdb格式，供后续校准量化使用。

lmdb数据集合的生成，有两种方式：一是通过convert_imageset工具直接针对测试图片集合生成，二是通过u_framework框架接口来生成，主要针对级联网络，后级网络的输入依赖前级网络的输出，例如mtcnn.

当前主要介绍convert_imageset工具如何生成数据集合，此方式是通用方式。u_framework生成级联网络所需要的lmdb在mtcnn demo中详细介绍。

SDK工具包包括了将图像转换成lmdb的工具，convert_imageset，类似于caffe框架中的使用方式。

   Usage:
       convert_imageset [FLAGS] ROOTFOLDER/ LISTFILE DB_NAME

其中ROOTFOLDER为图像集的根目录，LISTFILE 为列表文件，该文件中记录了图像集中的各图样的路径和相应的标注，DB_NAME为要生成的数据库的名字。

参数说明：

args

description

-backend

The backend (lmdb, leveldb) for storing the result. type: string default: "lmdb"

-check_size

When this option is on, check that all the datum have the same size. type: bool default: false

-encode_type

optional: What type should we encode the image as ('png','jpg',...). type: string default: ""

-encoded

When this option is on, the encoded image will be save in datum. type: bool default: false

-gray

When this option is on, treat images as grayscale ones. type: bool default: false

-resize_height

Height images are resized to. type: int32 default: 0

-resize_width

Width images are resized to. type: int32 default: 0

-shuffle

(Randomly shuffle the order of images and their labels. type: bool default: false

参考sdk中示例：

     # argv[1] : image dir
     # argv[2] : image list
     convert_imageset --shuffle --resize_height=256 --resize_width=256 \
               $IMG_DIR/  $IMG_DIR/ImgList.txt  $IMG_DIR/img_lmdb

为了使用刚生成的 lmdb 数据集,需要对网络的*.prototxt 文件作以下三方面的修改:

使用 Data layer 作为网络的输入
使 Data layer 的参数 data_param 指向生成的 lmdb 数据集的位置
修改 Data layer 的 transform_param 参数以对应网络对图片的预处理

例如：

完成上述步骤就可以配置完lmdb.

另外，在配置prototxt的datalayer参数时，提供了灵活的方式按照网络实际输入特征配置，支持的transform_paramters:

需要注意：

在编译 prototxt 文件时,transform_op 中定义的参数与 transform_op 外定义的参数只能二选一。例如上图左右，一个是transform_op的参数，另外一个则没有。
transform_op 中定义的参数按其在 prototxt 定义的顺序来执行,适用于灵活的数据预处理组合。
transform_op 外定义的参数其执行顺序是固定的。预处理的先后顺序如下：

transform_parameter支持的参数如下：

   message TransformationParameter {
// For data pre-processing, we can do simple scaling and subtracting the
// data mean, if provided. Note that the mean subtraction is always carried
// out before scaling.
optional float scale = 1 [default = 1];
// Specify if we want to randomly mirror data.
optional bool mirror = 2 [default = false];
// Specify if we would like to randomly crop an image.
optional uint32 crop_size = 3 [default = 0];
// mean_file and mean_value cannot be specified at the same time
optional string mean_file = 4;
// if specified can be repeated once (would subtract it from all the channels)
// or can be repeated the same number of times as channels
// (would subtract them from the corresponding channel)
repeated float mean_value = 5;
// Force the decoded image to have 3 color channels.
optional bool force_color = 6 [default = false];
// Force the decoded image to have 1 color channels.
optional bool force_gray = 7 [default = false];
// Resize policy
optional ResizeParameter resize_param = 8;
// Noise policy
optional NoiseParameter noise_param = 9;
// Constraint for emitting the annotation after transformation.
optional EmitConstraint emit_constraint = 10;
optional uint32 crop_h = 11 [default = 0];
optional uint32 crop_w = 12 [default = 0];
// Distortion policy
optional DistortionParameter distort_param = 13;
// Expand policy
optional ExpansionParameter expand_param = 14;
// TensorFlow data pre-processing
optional float crop_fraction = 15 [default = 0];
// if the number of resize is 1 preserve the original aspect ratio
repeated uint32 resize = 16;
// less useful
optional bool standardization = 17 [default = false];
repeated TransformOp transform_op = 18;
}

ResizeParameter定义：

/ Message that stores parameters used by data transformer for resize policy
message ResizeParameter {
//Probability of using this resize policy
optional float prob = 1 [default = 1];
enum Resize_mode {
WARP = 1;
FIT_SMALL_SIZE = 2;
FIT_LARGE_SIZE_AND_PAD = 3;
}
optional Resize_mode resize_mode = 2 [default = WARP];
optional uint32 height = 3 [default = 0];
optional uint32 width = 4 [default = 0];
// A parameter used to update bbox in FIT_SMALL_SIZE mode.
optional uint32 height_scale = 8 [default = 0];
optional uint32 width_scale = 9 [default = 0];
enum Pad_mode {
CONSTANT = 1;
MIRRORED = 2;
REPEAT_NEAREST = 3;
}
// Padding mode for BE_SMALL_SIZE_AND_PAD mode and object centering
optional Pad_mode pad_mode = 5 [default = CONSTANT];
// if specified can be repeated once (would fill all the channels)
// or can be repeated the same number of times as channels
// (would use it them to the corresponding channel)
repeated float pad_value = 6;
enum Interp_mode { //Same as in OpenCV
LINEAR = 1;
AREA = 2;
NEAREST = 3;
CUBIC = 4;
LANCZOS4 = 5;
}
//interpolation for for resizing
repeated Interp_mode interp_mode = 7;
}

transform_parameter中的transform_op的定义如下：

message TransformOp {
enum Op {
RESIZE = 0;
CROP = 1;
STAND = 2;
NONE = 3;
}
// For historical reasons, the default normalization for
// SigmoidCrossEntropyLoss is BATCH_SIZE and *not* VALID.
optional Op op = 1 [default = NONE];
//resize parameters
optional uint32 resize_side = 2 ;
optional uint32 resize_h = 3 [default = 0];
optional uint32 resize_w = 4 [default = 0];
//crop parameters
optional float
crop_fraction = 5;
optional uint32 crop_h = 6 [default = 0];
optional uint32 crop_w = 7 [default = 0];
optional float
padding = 8 [default = 0];//for resize_with_crop_or_pad
//mean substraction(stand)
repeated float mean_value = 9;
optional string mean_file = 10;
optional float scale = 11 [default = 1];
optional float div = 12 [default = 1];
optional bool
bgr2rgb = 13 [default = false];
}

以上步骤主要是为准确配置prototxt的datalayer参数，如何使用lmdb.

关于Annotated information的lmdb处理

对于检测网络来说,其 label 不仅仅是个数字,它包括类别,检测框的位置等复杂信息。对于这种情况,分两种情况处理:

1.如果lmdb已经生成存在，并且带有Annotated信息，那么在prototxt中需要使用AnnotatedData layer来加载lmdb，

例如：

2.如果lmdb还没有生成，那么在使用convert_imageset工具的时候，修改filelist中的label信息为随机数即可(<1000)，正常使用Data layer来加载此lmdb文件。（在量化网络过程中，annotated信息，例如label, bbox坐标都不是必须的，只需要保证生成lmdb的图片集合是正样本即可）

2.2 生成int8umodel

SDK提供工具，直接可以把fp32umodel(.caffemodel)转换成bitmain私有的中间临时模型(int8umodel)。量化工具中使用calibration_use_pb来执行校准操作，命令使用如下：

    calibration_use_pb \
          release \ #固定参数
          -model= PATH_TO/**.prototxt \ #描述网络结构的文件
          -weights=PATH_TO/**.fp32umodel \#网络系数文件(caffemodel可以直接使用)
          -iterations=1000 \ #迭代的次数(定点化过程中使用多少张图片，每次迭代使用一张图片)
          -bitwidth=TO_INT8 #固定参数

例如示例中的SSD模型转换的步骤：

      function calibration() 
      {
        calibration_use_pb release \
          -model=./ssd300_umodel.prototxt   \
          -weights=./ssd300.caffemodel  \
          -iterations=1000 \
          -bitwidth=TO_INT8

      }

运行正常后会产生输出部分：

      ├── ssd300_deploy_fp32_unique_top.prototxt
      ├── ssd300_deploy_int8_unique_top.prototxt
      ├── ssd300.int8umodel
      ├── ssd300_test_fp32_unique_top.prototxt
      ├── ssd300_test_int8_unique_top.prototxt
      └── ssd300_umodel.prototxt

*.int8umodel即量化生成的int8格式的网络系数文件

*_deploy_int8_unique_top.prototxt为int8格式的网络结构文件，

*_deploy_fp32_unique_top.prototxt为fp32格式的网络结构文件，int8_unique_top.prototxt文件作为后续部署生成bmodel使用, fp32为对比二者的区别。由于为部署文件，此prototxt不包含Datalayer，而是Input layer，同时各layer的输出blob是唯一的，不存在in-place的情况。

*_test_int8_unique_top.prototxt & *_test_fp32_unique_top.prototxt分别是int8/fp32的网络结构文件，此两个文件作为精度比较时使用，验证量化的结果。相对应地，此两文件包含Datalayer。

deploy_int8_*.prototxt与test_int8_*.prototxt的内容主要区别例如：

注意deployint8.prototxt比testint8.prototxt在第一层之前多了一个int8_scale的参数，该参数在部署算法推理的时候需要对Input data做scale处理。 int8_scale的含义是：在原有的float32网络的数据预处理的基础上，乘以int8_scale，四舍五入求整后，再送给int8网络。

上述量化校准后生成的文件都存放在fp32umodel的同级目录下。最终量化后，提供给下一步所需要的文件为： deploy_int8_unique_top.prototxt & *.int8umodel

PreviousFP32 Bmodel转化 NextINT8 Umodel转换INT8 Bmodel

Last updated 5 years ago

Was this helpful?