The more data, the better the AI, isn’t it? | Michael Kieweg

Michael Kieweg

Explore the nuanced relationship between data and AI performance, highlighting the importance of data quality, context, and human oversight in achieving accurate information extraction and classification.

Key takeaways
  • The idea that “more data, the better the AI” is oversimplified, as data quality and context are crucial factors in AI performance.
  • Leverton’s company focuses on real estate contracts and legal documents, which require complex information extraction and classification.
  • Optical character recognition (OCR) is necessary for converting images to searchable text, but it can be challenging, especially for documents with complex layouts or handwritten text.
  • The AI model is influenced by the training data, which must be carefully curated and annotated to ensure accuracy.
  • Human reviewers are necessary to correct and validate the machine’s output, which can be time-consuming and labor-intensive.
  • A two-step review process can improve accuracy and reduce errors.
  • The importance of post-processing in OCR is highlighted, as it can significantly improve the quality of the extracted text.
  • Leverton’s software uses deep learning technologies to automatically extract information from documents, but human expertise is still required for data cleansing and annotation.
  • The company has a large team of technical consultants who work closely with customers to set up and refine the data model and AI.
  • The AI system uses information from different documents to identify patterns and relationships, but it can be challenging to extract relevant information from unstructured texts.
  • Leverton’s software is used by over 100 customers who require high-quality data and reliable information extraction.
  • The company prioritizes data security and transparency, ensuring that customers have control over their data.
  • The importance of proper naming and descriptions in the data model is emphasized, as it can significantly impact the accuracy of the extracted information.
  • The AI system uses a combination of machine learning and human expertise to extract and classify data points, which can be challenging for complex documents with multiple sections and conflicting information.