Bauer & Fraunholz - Insights from Two Shipmates sailing in the LLM CTF @ SaTML 2024

Explore key insights from an LLM security CTF, covering attack techniques, defense mechanisms, and best practices for protecting sensitive data in AI systems. Real competition examples included.

Key takeaways

LLMs struggle to reliably keep secrets, especially when attackers know the structure/format of the protected information
Common attack techniques include:
- Jailbreaks and prompt manipulation
- Encoding (base64, hex, ASCII, Python arrays)
- NATO phonetic alphabet encoding
- Guided text generation
- Combining multiple techniques in layers
Key defense mechanisms include:
- Defense prompts (though often ineffective alone)
- Python/algorithmic filters
- LLM-based filters
- Character substitution/transformation
- Multiple defense layers combined
The EU AI Act requires:
- Documented adversarial testing for LLMs
- Implementation of cybersecurity protections
- Testing and validation of LLM security measures
Best practices for LLM security:
- Don’t store sensitive data in LLM context
- Implement proper data filtering before LLM processing
- Use multiple layers of defense
- Regularly test security measures
- Consider computational costs of security measures
The competition format (2 weeks, multiple defenses/targets) reflected real-world LLM security challenges
Automation played a crucial role in successful attacks, allowing teams to try multiple variations efficiently
Understanding token processing and LLM behavior patterns is essential for both attacks and defenses
Simple defense prompts asking LLMs to “forget” or “not reveal” secrets proved largely ineffective
Even with multiple security layers, determined attackers can often extract protected information through persistence and creative approaches

Bauer & Fraunholz - Insights from Two Shipmates sailing in the LLM CTF @ SaTML 2024

More talks