We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Awowale et al. - Simplifying analysis of hierarchical HDF5 and NetCDF4 files with xarray-datatree
Learn how DataTree simplifies working with hierarchical HDF5/NetCDF4 files, making NASA's petabyte-scale Earth observation data more accessible through efficient tree-based structures.
-
DataTree provides a simplified way to work with hierarchical HDF5 and NetCDF4 files by representing groups as a tree structure, avoiding the need to open multiple datasets separately
-
NASA has over 100 petabytes of Earth observation data stored in HDF format, with expected growth to 600 petabytes from new missions. Managing and accessing this data efficiently is a key challenge
-
Current tools like X-Ray and NetCDF4 require specifying groups individually to open data, which is inefficient and leads to complex code. DataTree allows viewing the entire dataset hierarchy at once
-
DataTree integrates with X-Ray to provide lazy loading, efficient computations, and familiar X-Ray operations while maintaining the hierarchical structure of the data
-
The tool enables cloud-optimized access to HDF data by reducing unnecessary file operations and memory usage compared to traditional methods
-
DataTree simplifies subsetting operations by eliminating the need to flatten/unflatten data structures and copy datasets multiple times
-
The project represents a collaboration between NASA and the open source community, with NASA engineers contributing to improve accessibility of Earth science data
-
DataTree will be integrated into the main X-Ray package in an upcoming release, providing long-term support and standardization
-
The tool supports various data formats beyond just HDF5/NetCDF4 and can be extended to work with other hierarchical data formats
-
Current implementations show significant performance improvements, with operations being up to 1000x faster compared to naive implementations when working with nested groups