DjangoCon Europe 2024 | Careful what you search for!

Learn how to carefully use regular expressions in your applications to avoid performance issues and security vulnerabilities, exploring the theoretical foundations and practical considerations of DFA, NFA, and backtracking.

Key takeaways
  • Regular languages can be defined using deterministic finite automata (DFA) or non-deterministic finite automata (NFA) with the choice operator (noop).
  • DFA can be converted to NFA, but not the other way around. Backtracking in NFA makes it slower and more prone to security vulnerabilities.
  • Singleton languages, which have a single string, can be matched with a DFA in linear time.
  • Regex libraries often use backtracking, which can lead to catastrophic slowdowns in the case of repeated queries.
  • Careful handling of non-determinism is necessary to avoid performance issues.
  • NFA can be executed in parallel, but this does not necessarily remove the vulnerability to backtracking.
  • A linear time promise must be ensured to avoid excessive computation.
  • Regular expressions have been around since the 1960s, but the theoretical foundation is essential to understand their limitations and vulnerabilities.
  • A deterministic state machine, like a DFA, can be constructed from a regular language, but it may not be as efficient as an NFA.
  • Implementation of NFA can be done with backtracking, which makes it slower, or with parallel execution, which can be more efficient.
  • Union, concatenation, and star operations can be combined to create more complex regular languages.
  • Fixing vulnerabilities requires careful handling of non-determinism and linear time performance.
  • RedoS, a type of Regex denial of service, can be caused by backtracking and slow performance.
  • Regex libraries should prioritize linear time performance to avoid security vulnerabilities.