Tutorials - Pavithra Eswaramoorthy, Dharhas Pothina: Data of Unusual Size: Interactive Visualization

Discover how to work with large datasets using Dask, HoloVis, and HP plot, and unlock the power of interactive visualization in this tutorial, featuring demonstrations in Jupyter notebooks and cloud-based computing.

Key takeaways
  • Interactive visualization: Pavithra Eswaramoorthy and Dharhas Pothina’s tutorial on interacting with unusual-sized datasets focuses on using Dask, a Python library for parallel computing, and HoloVis, a tool for managing big data.
  • Dask: Dask is used to scale computations and handle large datasets. It splits computations into smaller tasks and executes them in parallel on multiple nodes.
  • Dask Gateway: Dask Gateway is an open-source platform for running Dask clusters. It allows users to launch Dask clusters in the cloud or locally and access them through Jupyter notebooks.
  • HP plot: HP plot is a plotting library that provides interactive visualizations. It can be used to create a wide range of plots, including histograms, bar charts, and scatter plots.
  • Data preparation: The tutorial demonstrates how to read and prepare large datasets using Dask’s function read_csv().
  • Interactive visualization tools: The tutorial introduces several interactive visualization tools, including Dask, HoloVis, and HP plot.
  • Data cleaning: The tutorial highlights the importance of data cleaning, particularly when working with large datasets.
  • Jupyter notebooks: The tutorial uses Jupyter notebooks to demonstrate how to interact with large datasets using Dask and HP plot.
  • Cloud-based computing: The tutorial discusses the use of cloud-based computing for scaling computations and handling large datasets.
  • Limitations: The tutorial mentions the limitations of traditional data processing tools, such as slow performance and limited scalability, and highlights the benefits of using Dask and HoloVis for managing big data.
  • Panel serve: Panel serve is a tool that allows users to create interactive web applications using Jupyter notebooks.