Cesar Garcia - Improving Open Data Quality using Python | PyData Global 2023

Discover how to improve open data quality using Python, pandas, and grid expectation, covering data characteristics, EDA, validation, and documentation, and explore the importance of communication, accessibility, and data storytelling.

Key takeaways
  • Improving Open Data Quality using Python and grid expectation
  • Importance of data characteristics (accuracy, completeness, consistency, credibility)
  • Evaluating data sets using standards and international norms
  • Exploratory Data Analysis (EDA) to identify data issues
  • Using pandas library for data manipulation
  • Need for data validation and documentation
  • Importance of data context and mapping
  • Role of grid expectation in data quality improvement
  • Using open source solutions for data processing
  • Importance of communication, docummentation, and validation in data quality improvement
  • Need for data storytelling using data visualization
  • Importance of openness and accessibility in data sharing
  • Use of JSON, CSV, and other file formats
  • Process of dataset cleaning and preprocessing
  • Automation of data processing using Python
  • Use of Jupyter notebook for data exploration and visualization
  • Importance of data quality scores and metrics
  • Role of data analysis in decision making
  • Need for data literacy in data science