We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Jeroen Janssens - How I hacked UMAP and won at a plotting contest | PyData Amsterdam 2024
Learn how to visualize UMAP's dimension reduction process through clever hacks, animations, and Python plotting tools. Plus: insights into algorithm behavior and FFmpeg tricks.
-
UMAP is a state-of-the-art dimension reduction algorithm that can be visualized to better understand its behavior through animations and intermediate steps
-
Plot9 is a powerful Python plotting library based on the grammar of graphics, inspired by R’s ggplot2, offering a balance between quick plots and production-quality visualizations
-
Visualizing algorithm behavior can provide valuable insights - don’t just apply algorithms blindly, understand how their outputs relate to inputs and hyperparameters
-
FFmpeg is a versatile command-line tool that can be used to create animations and stitch frames together, even when more modern tools fail
-
The MNIST dataset (70,000 handwritten digits in 784 dimensions) serves as a good example dataset for demonstrating dimension reduction techniques
-
Plotting libraries each have their strengths - Matplotlib for customization, Altair for interactivity, Plot9 for grammar of graphics, Seaborn for statistical visualization
-
Hacking algorithms (making clever modifications) can help understand their inner workings - in this case, modifying UMAP to save intermediate results
-
Command-line tools remain relevant and powerful for data visualization workflows, especially when dealing with file operations and video processing
-
When visualizing algorithms, it’s valuable to show intermediate steps and evolution of the process, not just final results
-
Understanding algorithm behavior doesn’t always require deep mathematical knowledge - visualization can provide intuitive insights