From Text to Flaws: vulnerabilities in applications with Generative AI and LLMs - Paul Molin

Learn how attackers exploit LLM vulnerabilities like prompt injection and data leakage, and discover key defense techniques to build more secure AI applications.

Key takeaways
  • LLMs are both powerful and gullible - they can be manipulated through carefully crafted prompts while appearing to follow instructions

  • Key vulnerabilities in LLM applications:

    • Prompt injections allowing attackers to manipulate application behavior
    • Indirect data leakage through summarization tasks
    • Code execution risks when LLMs generate executable code
    • Multi-modal vulnerabilities through hidden text in images
    • Information extraction from custom GPTs
  • Primary defense techniques:

    • Dual LLMs pattern - using separate privileged and quarantined instances
    • Preflight prompts to validate inputs
    • Vector embeddings to detect malicious prompts
    • Escaping and sanitizing user inputs
    • Canary tokens for detecting data exfiltration
  • Best practices:

    • Limit LLM access to sensitive tools/APIs
    • Validate and sanitize all user inputs
    • Use established libraries for input handling
    • Implement monitoring and logging
    • Design for minimum blast radius
  • Additional challenges:

    • Cost implications of security measures
    • Difficulty distinguishing malicious from legitimate inputs
    • Balancing security with functionality
    • Handling multimodal inputs safely
    • Managing context length limitations
  • LLM applications require considering both traditional web security and novel AI-specific attack vectors