Better search relevance using Learning to Rank at mobile.de [PyCon DE & PyData Berlin 2024]

Learn how mobile.de built a fast Python search ranking service using MongoDB & XGBoost to improve marketplace relevance, achieving 6ms response times & 99.3% success rate.

Key takeaways
  • Python was chosen as the primary language for building the search ranking service due to team familiarity, ecosystem maturity, and existing data science workflows

  • The service architecture uses a two-stage approach:

    • Stage 1: Candidate retrieval using Elasticsearch/Java to find top 1000 matches
    • Stage 2: Re-ranking using Python service to get final top 20 results
  • MongoDB was selected over Redis for caching because it offers:

    • Native vector embedding support
    • Fast aggregation pipelines
    • Easy business logic integration
    • Better support for complex queries
  • Key technical implementation details:

    • FastAPI used as web framework for async support and documentation
    • Docker containers for deployment
    • Kubernetes for orchestration
    • Position bias handled through feature engineering
    • XGBoost 2.0 used for ranking model
  • Critical requirements defined upfront:

    • 30ms latency requirement
    • Support for business logic/boosting
    • Pagination capability
    • High availability
    • Stateless design
  • Testing and deployment best practices:

    • Comprehensive unit tests
    • Local testing with Docker Compose
    • A/B testing capabilities
    • Small Docker image sizes
    • Automated CI/CD pipeline
  • Key learnings:

    • Start with clear requirements documentation
    • Consider long-term maintainability
    • Handle feature availability carefully
    • Use cache-based approach for performance
    • Monitor business metrics closely
  • The project improved unique conversion metrics and enabled data scientists to deploy models independently without backend engineering bottlenecks

  • Service achieved 99.3% successful request rate with ~6ms response times

  • Two-sided marketplace considerations required balancing seller visibility with buyer relevance through multi-objective optimization