Vladimir Osin - Taming the Machine: Basics of ML Models Training and Inference Optimization

Learn the basics of machine learning models training and inference optimization, including mixed precision training, ONIX runtime, quantization, pruning, tensor parallelism, model parallelism, and more, to accelerate and deploy your models efficiently.

Key takeaways
  • Use mixed precision training to speed up model training.
  • Compile models with ONIX runtime for faster inference.
  • Use Jupyter notebooks as front-end for model training and deployment.
  • Quantization and pruning can reduce model size and speed up inference.
  • Use tensor parallelism and model parallelism for multi-GPU training.
  • Optimize batch size and optimizer for better model training.
  • Consider using containerization for model deployment.
  • Use PyTorch’s torch.compile functionality for faster model training.
  • Gradient checkpointing can reduce memory usage during model training.
  • Use ONNX runtime for model serving and deployment.
  • Consider using Mojo for model training and deployment.
  • Use batch normalization and updating to reduce training time.
  • Compiler infrastructure can help optimize model training.
  • Consider using AutoML tools for model selection and deployment.
  • Use parallelization and GPU acceleration for faster model training.
  • Consider using different hardware platforms for model deployment.
  • Monitor model performance and drift using tools like PaperMill.