Bauer & Fraunholz - Insights from Two Shipmates sailing in the LLM CTF @ SaTML 2024

Explore key insights from an LLM security CTF, covering attack techniques, defense mechanisms, and best practices for protecting sensitive data in AI systems. Real competition examples included.

Key takeaways
  • LLMs struggle to reliably keep secrets, especially when attackers know the structure/format of the protected information

  • Common attack techniques include:

    • Jailbreaks and prompt manipulation
    • Encoding (base64, hex, ASCII, Python arrays)
    • NATO phonetic alphabet encoding
    • Guided text generation
    • Combining multiple techniques in layers
  • Key defense mechanisms include:

    • Defense prompts (though often ineffective alone)
    • Python/algorithmic filters
    • LLM-based filters
    • Character substitution/transformation
    • Multiple defense layers combined
  • The EU AI Act requires:

    • Documented adversarial testing for LLMs
    • Implementation of cybersecurity protections
    • Testing and validation of LLM security measures
  • Best practices for LLM security:

    • Don’t store sensitive data in LLM context
    • Implement proper data filtering before LLM processing
    • Use multiple layers of defense
    • Regularly test security measures
    • Consider computational costs of security measures
  • The competition format (2 weeks, multiple defenses/targets) reflected real-world LLM security challenges

  • Automation played a crucial role in successful attacks, allowing teams to try multiple variations efficiently

  • Understanding token processing and LLM behavior patterns is essential for both attacks and defenses

  • Simple defense prompts asking LLMs to “forget” or “not reveal” secrets proved largely ineffective

  • Even with multiple security layers, determined attackers can often extract protected information through persistence and creative approaches