The more data, the better the AI, isn’t it? | Michael Kieweg

Explore the nuanced relationship between data and AI performance, highlighting the importance of data quality, context, and human oversight in achieving accurate information extraction and classification.

Key takeaways
  • The idea that “more data, the better the AI” is oversimplified, as data quality and context are crucial factors in AI performance.
  • Leverton’s company focuses on real estate contracts and legal documents, which require complex information extraction and classification.
  • Optical character recognition (OCR) is necessary for converting images to searchable text, but it can be challenging, especially for documents with complex layouts or handwritten text.
  • The AI model is influenced by the training data, which must be carefully curated and annotated to ensure accuracy.
  • Human reviewers are necessary to correct and validate the machine’s output, which can be time-consuming and labor-intensive.
  • A two-step review process can improve accuracy and reduce errors.
  • The importance of post-processing in OCR is highlighted, as it can significantly improve the quality of the extracted text.
  • Leverton’s software uses deep learning technologies to automatically extract information from documents, but human expertise is still required for data cleansing and annotation.
  • The company has a large team of technical consultants who work closely with customers to set up and refine the data model and AI.
  • The AI system uses information from different documents to identify patterns and relationships, but it can be challenging to extract relevant information from unstructured texts.
  • Leverton’s software is used by over 100 customers who require high-quality data and reliable information extraction.
  • The company prioritizes data security and transparency, ensuring that customers have control over their data.
  • The importance of proper naming and descriptions in the data model is emphasized, as it can significantly impact the accuracy of the extracted information.
  • The AI system uses a combination of machine learning and human expertise to extract and classify data points, which can be challenging for complex documents with multiple sections and conflicting information.