We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Bernice Waweru - Tricking Neural Networks : Explore Adversarial Attacks | PyData Global 2023
Learn how adversarial attacks trick neural networks, explore defense mechanisms, and understand security implications for machine learning models in this PyData Global talk.
- 
    Adversarial attacks are carefully designed perturbations added to inputs that trick neural networks into producing incorrect outputs while remaining imperceptible to humans 
- 
    Neural networks are more vulnerable to adversarial attacks compared to traditional ML models like logistic regression, primarily due to their reliance on gradient descent during training 
- 
    Two main types of attacks: - White box attacks: Attacker knows model architecture, parameters and training data
- Black box attacks: Attacker only has access to model outputs and must design attacks based on responses
 
- 
    Adversarial attacks are transferable - attacks designed for one model can often successfully fool other models, even those with different architectures 
- 
    Key defense mechanisms: - Input sanitization: Validate and clean user inputs before processing
- Adversarial training: Include adversarial examples in training data to build robustness
- Implement multiple defense methods as no single approach is completely effective
 
- 
    Generating adversarial attacks often involves: - Leveraging gradient descent
- Finding minimal input changes that maximize loss function
- Creating imperceptible perturbations that cause misclassification
 
- 
    Adversarial attacks are particularly concerning for LLMs and production systems in critical domains like finance, where incorrect predictions could have significant consequences 
- 
    While computationally expensive to generate, adversarial examples pose a serious security risk as motivated attackers can exploit these vulnerabilities 
- 
    Open source models are especially vulnerable since their architectures and parameters are publicly available 
- 
    Successful attacks can occur through subtle changes like adding imperceptible characters or changing single words while maintaining semantic meaning