Build TikTok's Personalized Real-Time Recommendation System in Python with Hopsworks

Learn how to build TikTok-style video recommendations in Python with Hopsworks, covering real-time feature engineering, model training, and personalized content serving.

Key takeaways
  • TikTok’s recommender system uses a two-tower model architecture with separate encoders for user queries and videos to create embeddings in the same vector space

  • System is composed of three main pipelines:

    • Feature pipeline: processes user interactions and video metadata
    • Training pipeline: creates and updates recommendation models
    • Inference pipeline: handles real-time predictions and serving
  • Key features are stored in a feature store (Hopsworks) which maintains:

    • User features (age, country, gender)
    • Video features (category, views, likes, length)
    • Interaction data between users and videos
  • Recommendation process has two main phases:

    • Retrieval: Uses vector similarity search to find hundreds of candidate videos
    • Ranking: Personalizes and orders candidates based on user preferences
  • System maintains fast feedback loop by:

    • Quickly logging user interactions (views, likes, watch time)
    • Updating feature values within seconds
    • Using fresh features for next predictions
  • Models are implemented using:

    • TensorFlow for embedding models
    • CatBoost for ranking model
    • Vector index for similarity search
  • Infrastructure uses:

    • Kafka for event streaming
    • K-serve for model serving
    • Feature store for data management
    • Vector database for embeddings
  • System achieves personalization through:

    • Recent user activity history
    • User demographic features
    • Video metadata and engagement metrics
    • Interaction patterns
  • Model training happens on regular schedules:

    • Embedding models updated periodically
    • Ranking models retrained frequently
    • Vector index updated with new videos
  • Data validation rules ensure data quality:

    • Value range checks
    • Data type validation
    • Business logic constraints