PyData Chicago October 2024 Meetup

Security Ai

Learn about emerging LLM security threats, from jailbreak attacks to data theft, and discover essential defensive strategies for protecting AI systems in production at PyData Chicago.

Key takeaways

Large Language Models (LLMs) are increasingly vulnerable to jailbreak attacks where adversaries can bypass safety restrictions through carefully crafted prompts
Adversarial attacks on AI systems can be performed through methods like Projected Gradient Descent (PGD) and Greedy Coordinate Gradient (GCG), which add carefully chosen noise to inputs to cause misclassification
Current LLM security risks include:
- Model theft/weight stealing
- Data leakage and exfiltration
- Prompt injection attacks
- Malicious payload steganography
- Automated jailbreaking
As LLMs gain more agency and control (ability to take actions), the security risks and potential harms increase significantly
Security best practices:
- Use model protection tools
- Implement careful access controls
- Monitor and validate model inputs
- Consider hiring AI security experts for red team testing
- Be cautious with untrusted user input
The AI security landscape is rapidly evolving with new vulnerabilities and attack vectors being discovered regularly
Current LLM security is an ongoing arms race between attackers and defenders, with no clear long-term solution yet
Companies deploying AI systems need to balance helpfulness/capabilities with safety/security controls
White box attacks (with model weight access) are generally more powerful but black box attacks are also possible through API access
Traditional adversarial robustness research from computer vision is being adapted for language models but faces new challenges

PyData Chicago October 2024 Meetup

More talks