Talks - Reuven M. Lerner: Times and dates in Pandas

Python

Learn to work efficiently with dates and times in Pandas - from datetime objects and timezones to time series analysis, resampling, and best practices for temporal data manipulation.

Key takeaways

Pandas can store dates/times as datetime64 objects which use 75% less memory than storing as strings and enable rich datetime functionality
Use pd.to_datetime() to convert string dates to datetime objects. For CSV imports, use parse_dates parameter in pd.read_csv()
Two key datetime concepts: specific moments in time (datetime objects) vs spans of time (timedelta objects)
Datetime columns can be accessed via the .dt accessor to extract components like year, month, day, hour etc.
Time series functionality is enabled by setting a datetime column as the index using set_index()
Chronological grouping can be done using pd.Grouper() with frequency codes like ‘1D’ (daily), ‘1M’ (monthly)
Resampling allows aggregating time series data at different frequencies, but requires datetime index
Avoid using inplace=True as it’s being deprecated and prevents method chaining
Time zones can be handled using .tz_localize() to assign zones and .tz_convert() to convert between zones
Invalid datetime parsing can be handled with errors='coerce' to convert bad values to NaT (Not a Time)
Datetime comparisons and sorting work naturally with both datetime strings and objects
Pivot tables and groupby operations work well with datetime components for temporal analysis
PyArrow backend generally handles datetime detection better than default CSV parser

Talks - Reuven M. Lerner: Times and dates in Pandas

More talks