bolt

Quantization Toolchain

Bolt currently supports various modes of post-training quantization, including quantized storage, dynamic quantization inference, offline calibration, etc. Bolt will provide quantization aware training tools in the future.

post training quantization

Please refer to model_tools/tools/quantization/post_training_quantization.cpp. All post-training quantization utilities are covered in this tool.

Before using this tool, you need to first produce the input model with X2bolt using the “-i PTQ” option. Later, you can use the tool:

./post_training_quantization --help
./post_training_quantization -p model_ptq_input.bolt

Different options of the tool are explained below. The default setting will produce model_int8_q.bolt which will be executed with dynamic int8 quantization. INT8_FP16 is for machine(ARMv8.2+) that supports fp16 to compute non quantized operators. INT8_FP32 is for machine(ARMv7, v8. Intel AVX512) that supports fp32 to compute non quantized operators. The command above is equivalent to this one:

./post_training_quantization -p model_ptq_input.bolt -i INT8_FP16 -b true -q NOQUANT -c 0 -o false

Here are the list of covered utilities: