Talks - Yury Selivanov: Overcoming GIL with subinterpreters and immutability

Learn how Python subinterpreters and immutable data structures can achieve true parallelism, with insights on efficient data sharing and overcoming GIL limitations.

Key takeaways
  • Python subinterpreters allow true parallelism by running multiple Python interpreters side-by-side in the same process, each with their own GIL

  • Shared immutable data structures enable efficient data sharing between subinterpreters without copying or pickling overhead

  • MemHive library implements efficient immutable data structures using HAMT (Hash Array Mapped Trie) algorithm with O(log n) complexity

  • Structured sharing enables updating immutable collections by only copying changed nodes in the tree structure, reusing unchanged parts

  • For collections with millions of elements, structured sharing is significantly faster than pickling when passing data between subinterpreters

  • The architecture uses three levels:

    • Level 1: Basic functions
    • Level 2: Queues and synchronization primitives
    • Level 3: AsyncIO bridge
  • Immutable collections can be safely accessed across subinterpreters without locks since they cannot be modified

  • The implementation uses trees behind the scenes but exposes a simple dict-like API to users

  • Performance scales well - adding single items to large collections only requires copying a few nodes rather than the entire structure

  • While still in prototype stage, the approach shows promise for CPU-intensive Python applications needing true parallelism