Braaten et al. - Bridging the gap between Earth Engine and the Scientific Python Ecosystem

Learn how to seamlessly integrate Google Earth Engine with Python using GEEMAP, enabling cloud-based processing of petabyte-scale satellite data without local downloads.

Key takeaways
  • Google Earth Engine contains ~100 petabytes of satellite data and adds ~1 petabyte monthly, making it a massive repository for geospatial analysis

  • The platform offers distributed computing capabilities, allowing users to process large datasets without downloading them locally or managing infrastructure

  • New developments have improved connectivity between Earth Engine and the Python ecosystem, enabling seamless integration with libraries like Pandas, GeoPandas, NumPy, and xarray

  • GEEMAP serves as a bridge between Earth Engine and Python, providing one-line code solutions for data visualization and analysis

  • Two main data catalogs exist:

    • Main catalog (Earth Engine’s master catalog)
    • Community catalog (user-contributed datasets)
  • Earth Engine provides ~250GB of storage space per project and handles complex tasks like:

    • Data projection alignment
    • Scale normalization
    • Distributed processing
  • Users can process data directly in the cloud and export only the results, avoiding the need to download massive raw datasets

  • The platform supports both vector and raster data analysis with simple conversion between formats

  • Interactive visualization capabilities allow real-time exploration of large-scale environmental and geospatial data

  • The system works on a “pull basis” where instructions are sent to Google’s servers, processed across multiple nodes, and results are returned to the client