Sujit Pal - Building Learning to Rank models for search using LLMs | PyData Global 2023

Ai

Learn how to build Learning-to-Rank models using LLMs to generate training data. Compare different ranking approaches and explore feature engineering for search relevance.

Key takeaways
  • LLMs can be used to generate relevance judgments for training Learning-to-Rank (LTR) models, reducing the need for expensive human annotations

  • The speaker implemented and compared four different LTR models:

    • Point-wise regression
    • RankNet (pairwise)
    • LambdaRank
    • LambdaMart
  • RankNet performed best in their experiments, achieving precision@10 of 8.38 compared to 8.50 for hand-tuned models

  • Key implementation details:

    • Used 61 features per query-document pair
    • Combined multiple data sources including lexical, vector, and knowledge graph features
    • Used Claude AI for generating relevance judgments
    • Implemented as both training and inference pipelines
  • LLM-generated judgments showed ~70% overlap with human expert judgments, though LLMs tended to be more lenient

  • The approach requires fewer training examples - only needs 5-10% of query-document pairs rather than exhaustive labeling

  • Feature engineering combined multiple approaches:

    • Term frequency and TF-IDF features
    • Concept and semantic group overlap
    • Vector similarities
    • Knowledge graph features
  • Primary advantages:

    • Reduces manual labeling effort
    • Achieves comparable performance to hand-tuned systems
    • Can be implemented with relatively little human intervention
    • Provides interpretable features compared to pure vector approaches
  • Main challenges identified:

    • LLMs sometimes make relevance leaps that humans wouldn’t
    • Performance degrades as relevance scores increase
    • Data distribution can be imbalanced
  • The approach is particularly valuable for domains lacking user feedback signals (like healthcare) compared to e-commerce where click data is abundant