Allen Downey - Extremes, outliers, and GOATS: on life in a lognormal world | PyData Global 2023

Learn why log-normal distributions are common in real-world data, from athletic performance to professional achievements, and their role in creating statistical outliers.

Key takeaways
  • Log-normal distributions are more common in real-world data than typically assumed, often providing better models than Gaussian distributions

  • The distribution of human weight follows a log-normal pattern, while height can be modeled well by both Gaussian and log-normal distributions

  • Two key mechanisms contribute to log-normal distributions:

    • Proportional gain: Changes occur as percentages of current value
    • Weakest link process: Performance is limited by the worst performing factor
  • Greatest of All Time (GOAT) performers are statistical outliers even among elite performers due to the long tail of log-normal distributions

  • Elite performance requires multiple factors:

    • Natural talent/aptitude
    • Training opportunities
    • Persistence/passion
    • Resources
    • All factors must be present; lacking any one prevents reaching elite levels
  • The Central Limit Theorem Corollary explains why multiplying random factors tends to produce log-normal distributions

  • Birth weights follow a Gaussian distribution, but adult weights become log-normal through proportional gain over time

  • Traditional statistical methods often default to assuming Gaussian distributions, but testing against log-normal models is important

  • Log-normal patterns appear in diverse fields including:

    • Athletic performance
    • Chess ratings
    • Musical ability
    • Professional achievements
  • Quantitative comparison between models can be done by:

    • Comparing CDFs
    • Calculating areas between curves
    • Using maximum likelihood methods