Words as weapons: The dark arts of Prompt Engineering by Jeroen Egelmeers

Learn about AI security risks and prompt injection attacks, including social engineering of models, bypassing restrictions, and best practices for secure AI implementation.

Key takeaways

Prompt injection can trick AI models by inserting hidden instructions in text, images or system prompts to bypass guardrails and restrictions
Social engineering tactics work on AI models similar to humans - models can be manipulated through emotional appeals and misdirection
Many companies are implementing AI systems without proper security considerations, like using LLMs to automatically process invoices or scan CVs without human oversight
System prompts and guardrails can be bypassed through techniques like:
- Using ASCII art or white text to hide restricted words
- Confusing the model by rephrasing banned topics
- Overflowing context windows with large amounts of text
- Injecting contradictory instructions
Custom GPTs and public AI interfaces pose security risks as they may leak sensitive information or be manipulated through prompt injection
Critical security practices when using AI:
- Always have human oversight/verification
- Don’t automate sensitive processes entirely
- Carefully validate AI system outputs
- Consider data privacy when using public AI tools
- Implement proper guardrails and restrictions
The rapid evolution of AI technology means security measures need constant updating as new vulnerabilities are discovered
Companies should thoroughly test AI systems for potential exploits before deployment in production
Proper prompt engineering knowledge is essential for both implementing AI safely and defending against adversarial prompts
Educational understanding of adversarial prompting helps developers build more secure AI systems

Words as weapons: The dark arts of Prompt Engineering by Jeroen Egelmeers

More talks