We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Santiago Soler - Pooch: a friend to fetch your data files | SciPy 2024
Learn how Pooch, a Python library, helps download & cache data files from the web. Features checksums, multi-protocol support & integration with data analysis tools.
-
Pooch is a Python library for downloading and caching data files from the web while verifying file integrity through checksums
-
Key features:
- Downloads from multiple protocols and services (GitHub, Zenodo, Dataverse)
- File integrity verification
- Caching system to avoid redundant downloads
- Support for custom downloaders and processors
- Chunked downloads for large files
-
Common use cases:
- Package maintainers managing sample datasets
- Researchers downloading scientific data
- Teachers/tutorial creators sharing example files
- Integration into reproducible workflows
-
Implementation approaches:
-
Basic usage with
pooch.retrieve()
for simple downloads -
Pooch
class for managing multiple files via registry - Version-specific downloads (development vs release versions)
- Custom processors for handling archives/zip files
-
Basic usage with
-
Advanced capabilities:
- Shared caches across user groups
- Custom download processors
- Plugin system for community-developed downloaders
- Registry management for file names and hashes
- Integration with data analysis tools (pandas, xarray)
-
Future roadmap:
- Improved logging configuration
- Better handling of custom URLs
- Single registry for URLs and hashes
- Enhanced plugin system
- JSON file format support