Talks - Reuven M. Lerner: Times and dates in Pandas

Learn to work efficiently with dates and times in Pandas - from datetime objects and timezones to time series analysis, resampling, and best practices for temporal data manipulation.

Key takeaways
  • Pandas can store dates/times as datetime64 objects which use 75% less memory than storing as strings and enable rich datetime functionality

  • Use pd.to_datetime() to convert string dates to datetime objects. For CSV imports, use parse_dates parameter in pd.read_csv()

  • Two key datetime concepts: specific moments in time (datetime objects) vs spans of time (timedelta objects)

  • Datetime columns can be accessed via the .dt accessor to extract components like year, month, day, hour etc.

  • Time series functionality is enabled by setting a datetime column as the index using set_index()

  • Chronological grouping can be done using pd.Grouper() with frequency codes like ‘1D’ (daily), ‘1M’ (monthly)

  • Resampling allows aggregating time series data at different frequencies, but requires datetime index

  • Avoid using inplace=True as it’s being deprecated and prevents method chaining

  • Time zones can be handled using .tz_localize() to assign zones and .tz_convert() to convert between zones

  • Invalid datetime parsing can be handled with errors='coerce' to convert bad values to NaT (Not a Time)

  • Datetime comparisons and sorting work naturally with both datetime strings and objects

  • Pivot tables and groupby operations work well with datetime components for temporal analysis

  • PyArrow backend generally handles datetime detection better than default CSV parser