We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Better search relevance using Learning to Rank at mobile.de [PyCon DE & PyData Berlin 2024]
Learn how mobile.de built a fast Python search ranking service using MongoDB & XGBoost to improve marketplace relevance, achieving 6ms response times & 99.3% success rate.
-
Python was chosen as the primary language for building the search ranking service due to team familiarity, ecosystem maturity, and existing data science workflows
-
The service architecture uses a two-stage approach:
- Stage 1: Candidate retrieval using Elasticsearch/Java to find top 1000 matches
- Stage 2: Re-ranking using Python service to get final top 20 results
-
MongoDB was selected over Redis for caching because it offers:
- Native vector embedding support
- Fast aggregation pipelines
- Easy business logic integration
- Better support for complex queries
-
Key technical implementation details:
- FastAPI used as web framework for async support and documentation
- Docker containers for deployment
- Kubernetes for orchestration
- Position bias handled through feature engineering
- XGBoost 2.0 used for ranking model
-
Critical requirements defined upfront:
- 30ms latency requirement
- Support for business logic/boosting
- Pagination capability
- High availability
- Stateless design
-
Testing and deployment best practices:
- Comprehensive unit tests
- Local testing with Docker Compose
- A/B testing capabilities
- Small Docker image sizes
- Automated CI/CD pipeline
-
Key learnings:
- Start with clear requirements documentation
- Consider long-term maintainability
- Handle feature availability carefully
- Use cache-based approach for performance
- Monitor business metrics closely
-
The project improved unique conversion metrics and enabled data scientists to deploy models independently without backend engineering bottlenecks
-
Service achieved 99.3% successful request rate with ~6ms response times
-
Two-sided marketplace considerations required balancing seller visibility with buyer relevance through multi-objective optimization