PyData Chicago October 2024 Meetup

Learn about emerging LLM security threats, from jailbreak attacks to data theft, and discover essential defensive strategies for protecting AI systems in production at PyData Chicago.

Key takeaways
  • Large Language Models (LLMs) are increasingly vulnerable to jailbreak attacks where adversaries can bypass safety restrictions through carefully crafted prompts

  • Adversarial attacks on AI systems can be performed through methods like Projected Gradient Descent (PGD) and Greedy Coordinate Gradient (GCG), which add carefully chosen noise to inputs to cause misclassification

  • Current LLM security risks include:

    • Model theft/weight stealing
    • Data leakage and exfiltration
    • Prompt injection attacks
    • Malicious payload steganography
    • Automated jailbreaking
  • As LLMs gain more agency and control (ability to take actions), the security risks and potential harms increase significantly

  • Security best practices:

    • Use model protection tools
    • Implement careful access controls
    • Monitor and validate model inputs
    • Consider hiring AI security experts for red team testing
    • Be cautious with untrusted user input
  • The AI security landscape is rapidly evolving with new vulnerabilities and attack vectors being discovered regularly

  • Current LLM security is an ongoing arms race between attackers and defenders, with no clear long-term solution yet

  • Companies deploying AI systems need to balance helpfulness/capabilities with safety/security controls

  • White box attacks (with model weight access) are generally more powerful but black box attacks are also possible through API access

  • Traditional adversarial robustness research from computer vision is being adapted for language models but faces new challenges