Andrei Stoian - Open-source Machine Learning on Encrypted Data | PyData Amsterdam 2024

Learn how ConcreteML enables machine learning on encrypted data using FHE. Discover secure applications in healthcare, LLMs & data marketplaces with PyTorch-like simplicity.

Key takeaways
  • Fully Homomorphic Encryption (FHE) enables processing encrypted data without decryption, providing security while data remains usable

  • ConcreteML is an open-source library that makes machine learning on encrypted data accessible, mimicking familiar frameworks like PyTorch and scikit-learn

  • Key FHE operations include:

    • Addition of encrypted values
    • Table lookups
    • Conversion of floating-point operations to integer operations
    • Working with quantized values (typically 8-16 bits)
  • Performance considerations:

    • 1,000-10,000x computation overhead compared to unencrypted operations
    • Ciphertext size can be up to 1000x larger than cleartext
    • Compression can reduce expansion factor to 10-20x
    • Latency of a few seconds is typical
  • Applications include:

    • Private inference
    • Spam filtering
    • DNA ancestry analysis
    • LLM fine-tuning
    • Secure data marketplaces
    • Healthcare data processing
  • Machine learning models supported:

    • Linear models
    • Decision trees
    • Neural networks
    • Large Language Models (with distributed computation)
  • Implementation requires:

    • Representative training data for calibration
    • Quantization parameters optimization
    • Noise management in encrypted computations
    • Converting floating point to integer operations
  • Security benefits:

    • Data remains encrypted during processing
    • Only the key holder can decrypt results
    • Protection against data leaks
    • Reduced regulatory compliance burden