We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Santiago Soler - Pooch: a friend to fetch your data files | SciPy 2024
Learn how Pooch, a Python library, helps download & cache data files from the web. Features checksums, multi-protocol support & integration with data analysis tools.
- 
    Pooch is a Python library for downloading and caching data files from the web while verifying file integrity through checksums 
- 
    Key features: - Downloads from multiple protocols and services (GitHub, Zenodo, Dataverse)
- File integrity verification
- Caching system to avoid redundant downloads
- Support for custom downloaders and processors
- Chunked downloads for large files
 
- 
    Common use cases: - Package maintainers managing sample datasets
- Researchers downloading scientific data
- Teachers/tutorial creators sharing example files
- Integration into reproducible workflows
 
- 
    Implementation approaches: - 
Basic usage with pooch.retrieve()for simple downloads
- 
Poochclass for managing multiple files via registry
- Version-specific downloads (development vs release versions)
- Custom processors for handling archives/zip files
 
- 
Basic usage with 
- 
    Advanced capabilities: - Shared caches across user groups
- Custom download processors
- Plugin system for community-developed downloaders
- Registry management for file names and hashes
- Integration with data analysis tools (pandas, xarray)
 
- 
    Future roadmap: - Improved logging configuration
- Better handling of custom URLs
- Single registry for URLs and hashes
- Enhanced plugin system
- JSON file format support