Vitis AI Quantizer for PyTorch#

Note

The Vitis AI Quantizer has been deprecated as of the Ryzen AI 1.3 release. AMD strongly recommends using the new AMD Quark Quantizer instead (please refer to the main documentation about Model Quantization).

Installation#

The Vitis AI Quantizer for PyTorch is distributed through a Docker containers which can be installed on Ubuntu 20.04, CentOS 7.8, 7.9, 8.1, and RHEL 8.3, 8.4. For developers working on Windows 11, WSL can be used to install the Vitis AI Docker container.

Standard Container#

To install the Docker container for the Vitis AI Quantizer for PyTorch, follow the instructions provided here: https://hub.docker.com/r/amdih/ryzen-ai-pytorch

GPU-Accelerated Container#

The standard Vitis AI Docker container does not support GPU-accelerated quantization. To create a container with GPU-accelerated quantization enabled, download the following archive and follow the instructions in the README file: https://account.amd.com/en/forms/downloads/ryzen-ai-software-platform-xef.html?filename=ipu-rel-3.5.0-353.tar.gz

Enabling Quantization#

To enable the Vitis AI Quantizer for PyTorch, activate the conda environment in the Vitis AI Pytorch Docker container:

conda activate vitis-ai-pytorch

Post-Training Quantization#

Post-Training Quantization requires the following files:

model.pth : Pre-trained PyTorch model, generally a .pth file.
model.py : A Python script including float model definition.
calibration dataset: A subset of the training dataset containing 100 to 1000 images.

A complete example of Post-Training Quantization is available in the Vitis AI GitHub repo.

Vitis AI Quantization APIs#

Vitis AI provides pytorch_nndct module with Quantization related APIs.

Import the vai_q_pytorch module:

from pytorch_nndct.apis import torch_quantizer, dump_xmodel

Generate a quantizer with quantization needed input and get the converted model:

input = torch.randn([batch_size, 3, 224, 224])
quantizer = torch_quantizer(quant_mode, model, (input))
quant_model = quantizer.quant_model

Forward a neural network with the converted model:

acc1_gen, acc5_gen, loss_gen = evaluate(quant_model, val_loader, loss_fn)

Output the quantization result and deploy the model.

quantizer.export_quant_config()

Export the quantized model for deployment.

quantizer.export_onnx_model()

Quantization Output#

If this quantization command runs successfully, two important files are generated in the output directory ./quantize_result:

<model>.onnx: Quantized ONNX model
Quant_info.json: Quantization steps of tensors. Retain this file for evaluating quantized models.

Hardware-Aware Quantization#

To enable hardware-aware quantization provide the target to the NPU specific archietecture as follows:

quantizer = torch_quantizer(quant_mode=quant_mode,
                            module=model,
                            input_args=(input),
                            device=device,
                            quant_config_file=config_file,
                            target=target)

The target of current version of NPU is AMD_AIE2_Nx4_Overlay_cfg0

Partial Quantization#

Partial quantization can be enabled by using QuantStab and DeQuantStub operator from the pytorch_nndct library. In the following example, we are quantizing the layers subm0 and subm2, but not the subm1:

from pytorch_nndct.nn import QuantStub, DeQuantStub

class WholeModule(torch.nn.module):
   def __init__(self,...):
      self.subm0 = ...
      self.subm1 = ...
      self.subm2 = ...

      # define QuantStub/DeQuantStub submodules
      self.quant = QuantStub()
      self.dequant = DeQuantStub()

   def forward(self, input):
       input = self.quant(input) # begin of part to be quantized
       output0 = self.subm0(input)
       output0 = self.dequant(output0) # end of part to be quantized

       output1 = self.subm1(output0)

       output1 = self.quant(output1) # begin of part to be quantized
       output2 = self.subm2(output1)
       output2 = self.dequant(output2) # end of part to be quantized

Fast Finetuning#

After post-training quantization, there is usually a small accuracy loss. If the accuracy loss is large, a fast-finetuning approach, which is based on the AdaQuant Algorithm, can be tried instead of the quantization aware training. The fast finetuning uses a small unlabeled data to calibrate the activations and finetuning the weights.

# fast finetune model or load finetuned parameter before test

if fast_finetune == True:
    ft_loader, _ = load_data(
               subset_len=5120,
               train=False,
               batch_size=batch_size,
               sample_method='random',
               data_dir=args.data_dir,
               model_name=model_name)

if quant_mode == 'calib':
    quantizer.fast_finetune(evaluate, (quant_model, ft_loader, loss_fn))
elif quant_mode == 'test':
    quantizer.load_ft_param()

Quantization Aware Training#

An example of Quantization Aware Training is available at the Vitis Github.

General approaches are:

If some non-module operations are needed to be quantized, convert them into module operations. For example, ResNet18 uses the + operator to add two tensors, which can be replaced by pytorch_nndct.nn.modules.functional.Add.
If some modules are called multiple times, uniqify them by defining multiple such modules and call them separately in the foward pass.
Insert QuantStub and DeQuantStub. Any sub-network from QuantStub to DeQuantStub in a forward pass will be quantized. Multiple QuantStub-DeQuantStub pairs are allowed.
Create Quantizer module from the QatProcessor library:

from pytorch_nndct import QatProcessor
qat_processor = QatProcessor(model, inputs, bitwidth=8)
quantized_model = qat_processor.trainable_model()
optimizer = torch.optim.Adam(
                  quantized_model.parameters(),
                  lr,
                   weight_decay=weight_decay)

For testing after the training, get the deployable model:

output_dir = 'qat_result'
deployable_model = qat_processor.to_deployable(quantized_model,output_dir)
validate(val_loader, deployable_model, criterion, gpu)

Export ONNX model for prediction:

qat_processor.export_onnx_model()

Vitis AI Quantizer for PyTorch

Contents

Vitis AI Quantizer for PyTorch#

Installation#

Standard Container#

GPU-Accelerated Container#

Enabling Quantization#

Post-Training Quantization#

Vitis AI Quantization APIs#

Quantization Output#

Hardware-Aware Quantization#

Partial Quantization#

Fast Finetuning#

Quantization Aware Training#