Nikolas Markou - Artificial Intelligence for Vision: A walkthrough of recent breakthroughs

Explore the latest breakthroughs in computer vision, including the emergence of transformers, multi-scale vision transformers, and innovative models that can recognize objects in images, videos, and 3D data.

Key takeaways

Computer vision is the field of AI that helps machines interpret and understand visual information.
The recent breakthroughs in computer vision are due to the emergence of transformers, which have enabled the creation of larger and more powerful models.
Visual transformers treat images as sequences of patches employing transformer encoding, similar to language models.
The multi-scale vision transformer is a recent innovation that has achieved state-of-the-art results in image recognition and object detection tasks.
The vision transformer has integrated images as a kind of language, allowing the model to understand and recognize objects in images.
The transformer architecture with its novel attention mechanism has changed the field of computer vision.
Computer vision is no longer limited to static images, but can now handle videos and 3D data.
The field of computer vision is evolving rapidly, with new breakthroughs and innovations being developed continuously.
The most commonly used models for object detection are YOLO versions 8 and 5, which have dominated the field due to their speed and accuracy.
The ConvNext family of models, especially ConvNext V1 and V2, are good alternatives to traditional CNN-based models.
The number of parameters in a model has a significant impact on its performance, with larger models generally performing better.
The activation functions used in the model can also impact its performance, with Swish being the most recent and popular activation function.
Data augmentation techniques are essential for improving the performance of computer vision models.
The future of computer vision is likely to involve the development of larger and more powerful models that can handle complex tasks such as scene understanding and object tracking.
The rise of transformers in computer vision has enabled the creation of models that can handle multiple modalities, including images, text, and speech.

Nikolas Markou - Artificial Intelligence for Vision: A walkthrough of recent breakthroughs

More talks