Bernice Waweru - Tricking Neural Networks : Explore Adversarial Attacks | PyData Global 2023

Security Ai

Learn how adversarial attacks trick neural networks, explore defense mechanisms, and understand security implications for machine learning models in this PyData Global talk.

Key takeaways

Adversarial attacks are carefully designed perturbations added to inputs that trick neural networks into producing incorrect outputs while remaining imperceptible to humans
Neural networks are more vulnerable to adversarial attacks compared to traditional ML models like logistic regression, primarily due to their reliance on gradient descent during training
Two main types of attacks:
- White box attacks: Attacker knows model architecture, parameters and training data
- Black box attacks: Attacker only has access to model outputs and must design attacks based on responses
Adversarial attacks are transferable - attacks designed for one model can often successfully fool other models, even those with different architectures
Key defense mechanisms:
- Input sanitization: Validate and clean user inputs before processing
- Adversarial training: Include adversarial examples in training data to build robustness
- Implement multiple defense methods as no single approach is completely effective
Generating adversarial attacks often involves:
- Leveraging gradient descent
- Finding minimal input changes that maximize loss function
- Creating imperceptible perturbations that cause misclassification
Adversarial attacks are particularly concerning for LLMs and production systems in critical domains like finance, where incorrect predictions could have significant consequences
While computationally expensive to generate, adversarial examples pose a serious security risk as motivated attackers can exploit these vulnerabilities
Open source models are especially vulnerable since their architectures and parameters are publicly available
Successful attacks can occur through subtle changes like adding imperceptible characters or changing single words while maintaining semantic meaning

Bernice Waweru - Tricking Neural Networks : Explore Adversarial Attacks | PyData Global 2023

More talks