Adrian Boguszewski - Beyond the Continuum: The Importance of Quantization in Deep Learning

Discover the importance of quantization in deep learning, including post-training quantization, quantization-aware training, and weight compression. Learn how to optimize models for quantization and reduce storage requirements while maintaining accuracy.

Key takeaways

Quantization is important for deep learning models as it allows for faster inference and reduced storage requirements with minimal loss of accuracy.
Post-training quantization is a technique that involves converting a pre-trained model to a lower precision representation without retraining.
The OpenVINO neural network compression framework, NNCF, can be used for quantization-aware training, post-training quantization, and weight compression.
The quantization process involves rounding and clipping values to reduce precision, and requires calibration data to ensure accurate results.
Fake quantization nodes can be added to a model during training to simulate the effects of quantization and adjust the model’s weights accordingly.
Quantization-aware training can be used to optimize models for quantization during the training process.
Post-training quantization can be used to convert models to a lower precision representation after they have been trained.
Weight compression can be used to reduce the size of a model’s weights, allowing for faster inference and reduced storage requirements.
The choice of quantization method depends on the specific use case and requirements, with post-training quantization and accuracy control being commonly used methods.
Quantization can be used in conjunction with other optimization techniques, such as pruning and sparsity, to further reduce the size and complexity of a model. *oreal performance differences can be seen between quantized and floating point models, with the quantized model being much faster and more efficient.
The OpenVINO toolkit provides a range of tools and frameworks for optimizing and running neural networks, including the NNCF and OpenVINO Runtime.

Adrian Boguszewski - Beyond the Continuum: The Importance of Quantization in Deep Learning

More talks