Bauer & Fraunholz - Insights from Two Shipmates sailing in the LLM CTF @ SaTML 2024

Bauer & Fraunholz

Explore key insights from an LLM security CTF, covering attack techniques, defense mechanisms, and best practices for protecting sensitive data in AI systems. Real competition examples included.

Key takeaways
  • LLMs struggle to reliably keep secrets, especially when attackers know the structure/format of the protected information

  • Common attack techniques include:

    • Jailbreaks and prompt manipulation
    • Encoding (base64, hex, ASCII, Python arrays)
    • NATO phonetic alphabet encoding
    • Guided text generation
    • Combining multiple techniques in layers
  • Key defense mechanisms include:

    • Defense prompts (though often ineffective alone)
    • Python/algorithmic filters
    • LLM-based filters
    • Character substitution/transformation
    • Multiple defense layers combined
  • The EU AI Act requires:

    • Documented adversarial testing for LLMs
    • Implementation of cybersecurity protections
    • Testing and validation of LLM security measures
  • Best practices for LLM security:

    • Don’t store sensitive data in LLM context
    • Implement proper data filtering before LLM processing
    • Use multiple layers of defense
    • Regularly test security measures
    • Consider computational costs of security measures
  • The competition format (2 weeks, multiple defenses/targets) reflected real-world LLM security challenges

  • Automation played a crucial role in successful attacks, allowing teams to try multiple variations efficiently

  • Understanding token processing and LLM behavior patterns is essential for both attacks and defenses

  • Simple defense prompts asking LLMs to “forget” or “not reveal” secrets proved largely ineffective

  • Even with multiple security layers, determined attackers can often extract protected information through persistence and creative approaches