Shaurya Agarwal - All Them Data Engines: Data Munging with Python circa 2023 | PyData Global 2023

Learn effective data munging techniques in Python using NumPy, pandas, and default dict, with benefits such as improved readability, lower memory overhead, and efficient calculations.

Key takeaways
  • Use Python for data munging with NumPy and pandas.
  • Python code is more readable and easier to maintain with pandas.
  • Use list comprehension to create a list of unique tags, and sort them.
  • Groups by object in pandas is useful for aggregating data.
  • Avoid using lists for large amounts of data, use NumPy arrays or pandas data frames instead.
  • Memory overhead of pandas is lower due to its use of NumPy arrays under the hood.
  • Use default dict for handling missing values in data frames.
  • Data types in NumPy are strict, which can make it easier to work with large datasets.
  • Use eager evaluation in Python for simplicity and performance.
  • Grouping data by year or genre in pandas is easy and straightforward.
  • Use NumPy arrays to do calculations and aggregations efficiently.
  • Python’s typing module allows for type annotations, which can be used to improve code readability and maintainability.