We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Rapid Feature Harvesting Using DFS & Data Engineering Techniques • Ananth Gundabattula • YOW! 2019
Discover how to rapidly harvest new features using Depth-First Search and data engineering techniques, automating the costly process and generating features in parallel with Apache Calcite for improved scalability and efficiency.
- 
Rapid Feature Harvestingis a technique that combines DFS (Depth-First Search) and data engineering to quickly generate new features from large datasets. - Feature engineering is a costly process, especially for large datasets, and there are efforts to automate it.
 - The speaker demonstrates a feature harvesting library that can generate features in parallel, using a graph-based approach.
 - The library uses a base feature definition to generate new features by applying relationships between columns.
 - Features can be categorized as direct, aggregation, or join-based, depending on the strategy used to generate them.
 - The speaker discusses the importance of lineage and metadata in feature engineering, and how the library provides features for managing these aspects.
 - He also mentions the challenge of feature explosion, where too many features are generated, making it difficult to identify the most relevant ones.
 - The library uses Apache Calcite to optimize the feature generation process, making it more efficient and scalable.
 - The speaker discusses the application of the feature harvesting library in various domains, including retail, finance, and healthcare.
 - He also mentions the potential for applying the technique in other areas, such as streaming data and real-time analytics.
 - Example features generated include the “max of average transaction amounts across sections” and the “count of transactions per customer”.
 - The speaker highlights the potential benefits of the technique, including reduced development time and cost, and improved feature selection.