We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Better search relevance using Learning to Rank at mobile.de [PyCon DE & PyData Berlin 2024]
Learn how mobile.de built a fast Python search ranking service using MongoDB & XGBoost to improve marketplace relevance, achieving 6ms response times & 99.3% success rate.
- 
    Python was chosen as the primary language for building the search ranking service due to team familiarity, ecosystem maturity, and existing data science workflows 
- 
    The service architecture uses a two-stage approach: - Stage 1: Candidate retrieval using Elasticsearch/Java to find top 1000 matches
- Stage 2: Re-ranking using Python service to get final top 20 results
 
- 
    MongoDB was selected over Redis for caching because it offers: - Native vector embedding support
- Fast aggregation pipelines
- Easy business logic integration
- Better support for complex queries
 
- 
    Key technical implementation details: - FastAPI used as web framework for async support and documentation
- Docker containers for deployment
- Kubernetes for orchestration
- Position bias handled through feature engineering
- XGBoost 2.0 used for ranking model
 
- 
    Critical requirements defined upfront: - 30ms latency requirement
- Support for business logic/boosting
- Pagination capability
- High availability
- Stateless design
 
- 
    Testing and deployment best practices: - Comprehensive unit tests
- Local testing with Docker Compose
- A/B testing capabilities
- Small Docker image sizes
- Automated CI/CD pipeline
 
- 
    Key learnings: - Start with clear requirements documentation
- Consider long-term maintainability
- Handle feature availability carefully
- Use cache-based approach for performance
- Monitor business metrics closely
 
- 
    The project improved unique conversion metrics and enabled data scientists to deploy models independently without backend engineering bottlenecks 
- 
    Service achieved 99.3% successful request rate with ~6ms response times 
- 
    Two-sided marketplace considerations required balancing seller visibility with buyer relevance through multi-objective optimization