We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Bernice Waweru - Tricking Neural Networks : Explore Adversarial Attacks | PyData Global 2023
Learn how adversarial attacks trick neural networks, explore defense mechanisms, and understand security implications for machine learning models in this PyData Global talk.
-
Adversarial attacks are carefully designed perturbations added to inputs that trick neural networks into producing incorrect outputs while remaining imperceptible to humans
-
Neural networks are more vulnerable to adversarial attacks compared to traditional ML models like logistic regression, primarily due to their reliance on gradient descent during training
-
Two main types of attacks:
- White box attacks: Attacker knows model architecture, parameters and training data
- Black box attacks: Attacker only has access to model outputs and must design attacks based on responses
-
Adversarial attacks are transferable - attacks designed for one model can often successfully fool other models, even those with different architectures
-
Key defense mechanisms:
- Input sanitization: Validate and clean user inputs before processing
- Adversarial training: Include adversarial examples in training data to build robustness
- Implement multiple defense methods as no single approach is completely effective
-
Generating adversarial attacks often involves:
- Leveraging gradient descent
- Finding minimal input changes that maximize loss function
- Creating imperceptible perturbations that cause misclassification
-
Adversarial attacks are particularly concerning for LLMs and production systems in critical domains like finance, where incorrect predictions could have significant consequences
-
While computationally expensive to generate, adversarial examples pose a serious security risk as motivated attackers can exploit these vulnerabilities
-
Open source models are especially vulnerable since their architectures and parameters are publicly available
-
Successful attacks can occur through subtle changes like adding imperceptible characters or changing single words while maintaining semantic meaning