We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
James Powell - How Dimensional is a `pandas.DataFrame`, anyway? | PyData Amsterdam 2024
Dive deep into the true one-dimensional nature of pandas DataFrames with James Powell. Learn how understanding dimensionality impacts data modeling & performance.
-
A Pandas DataFrame is fundamentally one-dimensional data with a hierarchical index, despite often being described as two-dimensional
-
The distinction between structural coordinates (fixed, countable, human-scale) and data coordinates (variable, uncountable, automatable) is key to understanding DataFrame dimensionality
-
Index alignment is a core feature of Pandas - it’s about operating on one-dimensional, index-aligned collections of data
-
Group by operations and stack/unstack are essentially equivalent - they’re both about turning one homogeneous dataset into multiple datasets
-
Lists in Python are fundamentally one-dimensional and loosely homogeneous, while tuples represent one thing with multiple aspects
-
NumPy arrays are fixed-size and strictly homogeneous, providing an interpretive view of contiguous memory
-
Pandas operations are optimized for working down the index, not across columns - this affects performance and API design
-
Prices and financial data are inherently non-linear - proper modeling requires understanding this limitation
-
Multi-leg trades and complex financial operations are better modeled as one-dimensional series with appropriate indexing than forced into two dimensions
-
The “two-dimensional” nature of DataFrames is more about convenience in representation than actual dimensionality of the underlying data structure