Denis Fedoseev-How Geometry Helps in Data Analytics and ML Problems| PyData Yerevan Oct. 2022 Meetup

Learn how geometry helps in data analysis and machine learning by understanding manifolds, Minkowski dimensions, and geodesic shooting techniques, improving accuracy and dimension reduction in complex data sets.

Key takeaways
  • Geometry can help in data analysis and machine learning by better understanding manifolds and their dimensions.
  • Manifolds are sets of points that locally resemble a deformed disk in a high-dimensional Euclidean space.
  • The Minkowski dimension is a measure of how “skinny” a manifold is, and can be used to better understand data distributions.
  • Geodesic shooting is an algorithm used to find curves on a manifold that are shortest in certain senses.
  • Manifolds can be used in machine learning to improve the accuracy of classifications and clustering.
  • The dimension reduction is tremendous, from over 1,700 down to 30 dimensions, making it easier to work with data.
  • The Euclidean distance metric may not be the best choice for datasets that are not uniformly distributed.
  • The Minkowski dimension can be used to describe the volume of a set and can be used to improve the accuracy of distance calculations.
  • The ambient space may not always be necessary to define a metric on a manifold.
  • The difficulty in constructing a manifold and finding its dimension is further complicated by the limited amount of data available.
  • There are many open challenges in this area, and more research is needed to improve the methods for manifold construction and dimension estimation.