Jeroen Janssens - How I hacked UMAP and won at a plotting contest | PyData Amsterdam 2024

Learn how to visualize UMAP's dimension reduction process through clever hacks, animations, and Python plotting tools. Plus: insights into algorithm behavior and FFmpeg tricks.

Key takeaways
  • UMAP is a state-of-the-art dimension reduction algorithm that can be visualized to better understand its behavior through animations and intermediate steps

  • Plot9 is a powerful Python plotting library based on the grammar of graphics, inspired by R’s ggplot2, offering a balance between quick plots and production-quality visualizations

  • Visualizing algorithm behavior can provide valuable insights - don’t just apply algorithms blindly, understand how their outputs relate to inputs and hyperparameters

  • FFmpeg is a versatile command-line tool that can be used to create animations and stitch frames together, even when more modern tools fail

  • The MNIST dataset (70,000 handwritten digits in 784 dimensions) serves as a good example dataset for demonstrating dimension reduction techniques

  • Plotting libraries each have their strengths - Matplotlib for customization, Altair for interactivity, Plot9 for grammar of graphics, Seaborn for statistical visualization

  • Hacking algorithms (making clever modifications) can help understand their inner workings - in this case, modifying UMAP to save intermediate results

  • Command-line tools remain relevant and powerful for data visualization workflows, especially when dealing with file operations and video processing

  • When visualizing algorithms, it’s valuable to show intermediate steps and evolution of the process, not just final results

  • Understanding algorithm behavior doesn’t always require deep mathematical knowledge - visualization can provide intuitive insights