Vitis AI Quantizer for TensorFlow#

Note

The Vitis AI Quantizer has been deprecated as of the Ryzen AI 1.3 release. AMD strongly recommends using the new AMD Quark Quantizer instead (please refer to the main documentation about Model Quantization).

Note

All TensorFlow related documentation is applicable to the TensorFlow 2 version.

Installation#

The Vitis AI Quantizer for TensorFlow is distributed through a Docker containers which can be installed on Ubuntu 20.04, CentOS 7.8, 7.9, 8.1, and RHEL 8.3, 8.4. For developers working on Windows 11, WSL can be used to install the Vitis AI Docker container.

Standard Container#

To install the Docker container for the Vitis AI Quantizer for TensorFlow, follow the instructions provided here: https://hub.docker.com/r/amdih/ryzen-ai-tensorflow2

GPU-Accelerated Container#

The standard Vitis AI Docker container does not support GPU-accelerated quantization. To create a container with GPU-accelerated quantization enabled, download the following archive and follow the instructions in the README file.

Download and build GPU Docker containers

Enabling Quantization#

To enable the Vitis AI Quantizer for TensorFlow, activate the conda environment in the Vitis AI Pytorch TensorFlow 2 container:

conda activate vitis-ai-tensorflow2

Post-Training Quantization#

Post-Training Quantization requires the following files:

  1. Float model : Floating-point TensorFlow models, either in h5 format or a saved model format.

  2. Calibration dataset: A subset of the training dataset containing 100 to 1000 images.

A complete example of Post-Training Quantization is available at Vitis AI GitHub.

Vitis AI Quantization APIs#

Vitis AI provides the vitis_activation module into Tensorflow library for quantization. The following code shows the usage:

model = tf.keras.models.load_model(‘float_model.h5’)
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(model)
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset,calib_steps=100,calib_batch_size=10, **kwargs)
  • calib_dataset: Used as a representative calibration dataset for calibration.

  • calib_steps: Total number of steps for calibration.

  • calib_batch_size: Number of samples per batch for calibration.

  • input_shape: Input shape for each input layer.

  • kwargs: Dictionary of the user-defined configurations of quantize strategy.

Exporting the Model for Deployment#

After the quantization, the quantized model can be saved into ONNX to deploy with ONNX Runtime Vitis AI Execution Provider:

quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset,
                                           output_format='onnx',
                                           onnx_opset_version=11,
                                           output_dir='./quantize_results',
                                           **kwargs)

Fast Finetuning#

After post-training quantization, usually there is a small accuracy loss. If the accuracy loss is large, a fast finetuning approach based on the AdaQuant Algorithm can be tried instead of quantization aware training. This approach uses a small unlabeled data to calibrate the activations and finetuning the weights.

quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset, calib_steps=None, calib_batch_size=None,
                                           include_fast_ft=True, fast_ft_epochs=10)

Fast finetuning related parameters are as follows:

  • include_fast_ft: indicates whether to do fast finetuning or not.

  • fast_ft_epochs: indicates the number of finetuning epochs for each layer.

Quantization Aware Training#

An example of the Quantization Aware Training is available in the Vitis Github repo.

The general steps are as follows:

  1. Prepare the floating point model, training dataset, and training script.

  2. Modify the training by using VitisQuantizer.get_qat_model to convert the model into a quantized model and then proceed to training/finetuning it:

model = tf.keras.models.load_model(‘float_model.h5’)
# Call Vai_q_tensorflow2 api to create the quantize training model
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(model)
qat_model = quantizer.get_qat_model(init_quant=True,
                                    calib_dataset=calib_dataset)

# Then run the training process with this qat_model to get the quantize finetuned model.
# Compile the model
qat_model.compile(optimizer= RMSprop(learning_rate=lr_schedule),
               loss=tf.keras.losses.SparseCategoricalCrossentropy(),
               metrics=keras.metrics.SparseTopKCategoricalAccuracy())

# Start the training/finetunin
qat_model.fit(train_dataset)
  1. Call model.save() to save the trained model or use callbacks in model.fit() to save the model periodically.

# save model manually
qat_model.save(‘trained_model.h5’)

# save the model periodically during fit using callbacks
qat_model.fit(train_dataset,
              callbacks = [
                     keras.callbacks.ModelCheckpoint(
                     filepath=’./quantize_train/’
                     save_best_only=True,
                     monitor="sparse_categorical_accuracy",
                     verbose=1,
              )])
  1. Convert the model to a deployable state by get_deploy_model API.

quantized_model = vitis_quantizer.get_deploy_model(qat_model)
quantized_model = quantizer.quantize_model(calib_dataset=calib_dataset,
                                           output_format='onnx',
                                           onnx_opset_version=11,
                                           output_dir='./quantize_results',**kwargs)