Confused Learning: Supply Chain Attacks through Machine Learning Models

Python Security

Learn how ML models are exploited for supply chain attacks via Lambda layers & metadata files. Discover key attack vectors, detection gaps, & defensive strategies for ML environments.

Key takeaways

Machine learning models can contain malware through various formats, with Keras/TensorFlow models being particularly vulnerable through Lambda layers and metadata files
Supply chain attacks through ML models require no special ML expertise - basic Python knowledge and C2 framework operation skills are sufficient
ML environments are high-value targets due to direct access to business crown jewels (data), broad permissions, and low security visibility
Common attack vectors include:
- Public model repositories like Hugging Face
- Organization registration and social engineering
- Poisoned models in development/testing environments
- Lambda layer code execution
- Metadata file manipulation
Current detection capabilities are limited:
- No standardized model evaluation process
- Lack of consistent model documentation
- Traditional AV struggles with large model files
- Few purpose-built security tools
Defensive recommendations:
- Environmental hardening of ML pipelines
- Implementing proper access controls and logging
- Using static analysis tools for model inspection
- Avoiding pickle-based models
- Establishing model evaluation procedures
Model infection rates are relatively low (~1.7% contained code) but impact can be severe due to privileged access and persistence
Need for improved security tooling including:
- Better static analysis capabilities
- Standardized model cards
- DFIR tooling specific to ML environments
- Yara/Semgrep rules for model scanning
ML teams often prioritize experimentation over security, leading to reduced security controls and increased attack surface
Supply chain attacks through ML models can be more persistent and stealthy than traditional phishing attacks

Confused Learning: Supply Chain Attacks through Machine Learning Models

More talks