We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Guillame-Bert & Spektor - Safe, fast, and easy time series preprocessing with Temporian | SciPy 2024
Learn about Temporian, a Python library for time series preprocessing that prevents data leakage and offers high performance through C++. See examples and best practices.
-
Temporian is a Python library for safe, simple and efficient preprocessing of temporal data, developed collaboratively by Google and Trial Labs
-
Key features include:
- Prevention of future data leakage through explicit operators
- High performance C++ core implementation
- Support for different temporal data types (time series, sequences, multivariate data)
- Native handling of hierarchical/indexed data
- Integration with common ML/data science tools
-
Data is handled through “event sets” - the core data structure that unifies different temporal data types
- Supports various timestamp formats and value types (int, float, boolean, string)
- Preserves hierarchical structure of data
- Enables efficient operations through memory optimizations
-
Operations are chainable with a functional API:
- No side effects or modifications to original data
- Each operation returns new event sets
- Includes moving windows, resampling, aggregations
- Supports arbitrary Python functions through map operator
-
Current limitations and status:
- Version 0.6 (pre-1.0)
- Single-threaded execution (multithreading planned)
- No native C++ interface yet
- Focused on local execution but Apache Beam integration possible for large-scale processing
-
Design philosophy emphasizes:
- Safety over speed for preprocessing operations
- Familiar API similar to pandas
- Minimal development and maintenance costs
- Unix philosophy of doing one thing well