Changelog


All notable changes to the Bolt project will be documented in this file.

The format is based on Keep a Changelog,

and this project adheres to Semantic Versioning.

[1.2.1] - 2021-9-11

Added

  • Support more graph optimizations : Convolution+Convolution, LayerNorm
  • Support more operators: ROIAlign, GenerateProposals, Reciprocal, Not, Log, ReductionL2, InstanceNorm, Expand, Gather, Scatter
  • Support more operators(PReLU) process NCHW input data.
  • Support ONNX share weight between Linear, MatMul, Gemm and Gather
  • Support more networks on CPU: vision transformer(ViT, TNT), recommendation networks
  • Support more networks on GPU : ASR, Faster_RCNN
  • Support Armv7 int8 to accelerate NLP network(50%+ speed-up)
  • Support X86 AVX512 int8 to accelerate NLP network(3x+ speed-up)
  • Support using image on Qualcomm GPU, add GPU image manage methods
  • Improve inference performance on Qualcomm GPU
  • Add more kit android/iOS demos : Chinese ASR, Face Detection, Sentiment Analysis
  • Try to bind core when using GPU

Changed

  • Replace mali option with gpu in install shell script, and remove default target option setting
  • Change data format NCWHC4 TO NCHWC4 for GPU
  • Simplified tensor padding method with OclMemory for GPU
  • Tool preprocess_ocl produces algofile and xxxlib.so before, for now algofile has been packaged into this xxxlib.so
  • Add BNN_FP16 option in X2bolt tool to convert ONNX 1-bit model
  • Replace original INT8 option with INT8_FP16 in post_training_quantization tool to convert int8+float16 hybrid inference model, and add INT8_FP32 option to convert int8+float32 hybrid inference model.
  • Add shell environment variable BOLT_INT8_STORAGE_ERROR_THRESHOLD to control post_training_quantization convert int8 model, default value is 0.002. post_training_quantization will use int8 storage when when quantization error lower than BOLT_INT8_STORAGE_ERROR_THRESHOLD.

Fixed

  • Fix PReLU 2d, 3d support
  • Fix Resize bug on some mode
  • Fix ONNX converter read Squeeze, UnSqueeze, Deconv parameter bug
  • Fix Arm Sigmoid precision
  • Fix ONNX RNN optimizer, and add support for NCHWC8 input data
  • Fix Concat with weight tensor in onnx converter
  • Simplify C API example

Removed

[1.2.0] - 2021-3-15

Added

  • Support x86 compilation and cross-compialtion for ios/android on MacOs
  • Support x86 compilation and cross-compilation for android on Windows
  • Support MTK armv7 cross compilation toolchains on Linux by using linux-armv7_blank target
  • Add Gitbook for user reference
  • Support image nearest Resize and align_corners Resize
  • Support more graph optimizations : Transpose+Concat+Transpose, Padding+Transpose, HardSwish-Fusion, Relu6-Fusion, Resize-Fusion, SwapTransposeEltwise, SwapPadTranspose, Convolution+Eltwise, Transpose+Matmul
  • Support more operators: 3D-convolution, Where, SoftPlus, Exp, Split, Tdnn, Dropout, TopK, SpaceToBatchNd, BatchToSpaceNd, Abs, Equal, Sign, Resize(more mode)
  • Support more networks on CPU: Reactnet, Tdnn, ShuffleNet, DenseNet, Hrnet, Efficientnet, Noah KWS2.0
  • Support more networks on mali GPU : TinyBert, nmt
  • Add more kit android/iOS demos : Simple-Image-Classification, Image-SuperResolution, Image-Classification
  • Support float16, int8 model storage on any hardware
  • Add Flow Java API

Changed

  • Change install, GPU library process shell script
  • Optimize TfSlice with 75%+ speed-up on cpu
  • Optimize Concat with 50%+ speed-up on cpu
  • Optimize Deconvolution with 10%+ speed-up on cpu
  • Optimize YoloDetection network with 15%+ speed-up on cpu
  • Optimize resnet50 from 90ms+ to 70ms+ on x86, faster than openvino
  • Optimize mobilenet v1/v2 with 10%+ speed-up on x86
  • Optimize tts-melgan network from 200ms+ to 160ms on x86
  • Optimize model read time
  • Change Java API package name and use com.huawei.noah, split single API file to 6 files.

Fixed

  • Fix length of op/tensor name > 128 not-supporting bug
  • Fix Caffe input dims extraction bug
  • Fix Concat with single input in onnx converter
  • Fix padding(nhwc) not-supporting bug
  • Fix relu6 insertion in tflite converter
  • Fix GRU, LSTM LBR_GRU model converter and inference bug
  • Fix X86 convolution, fully connected operators inference bug

Removed

  • Remove third party library FFTW and using FFTS for ASR example

[1.0.0] - 2020-11-20

Added

  • Support fp32 on X86 AVX2 CPU
  • Support partial fp32 operator(convolution, lstm) multi-threads parallel
  • Support Tensorflow model
  • Support more networks(Pointnet, ...)
  • Support more networks int8 inference(TinyBert, NMT, ASR)
  • Support time-series data acceleration
  • Support Apple IOS phone

[0.3.0] - 2020-06-01

Added

  • Optimized fp16 on ARM MALI GPU
  • Support fp32 on ARMv7 CPU
  • Support int8 PTQ calibration
  • Support more networks(SSD, ASR, TTS)
  • Support image classification task on ARM MALI GPU

[0.2.0] - 2020-03-06

Added

  • Support fp32 on ARMv8 CPU
  • Support fp16 on ARM MALI GPU
  • Support memory reuse for feature maps and weight-sharing between operators
  • Support dynamic input size
  • Support CPU affinity setting
  • Support convolution algorithm auto-tuning (runtime or full parameter space search)
  • Support Java and C API

[0.1.0] - 2019-12-01

Added

  • Support Caffe/ ONNX/ Tflite
  • Support fp16/int8/binary
  • Support Sequential/CNN/LSTM (common models of CV and NLP)

results matching ""

    No results matching ""