Talks - Yury Selivanov: Overcoming GIL with subinterpreters and immutability

Yury Selivanov

Learn how Python subinterpreters and immutable data structures can achieve true parallelism, with insights on efficient data sharing and overcoming GIL limitations.

Key takeaways
  • Python subinterpreters allow true parallelism by running multiple Python interpreters side-by-side in the same process, each with their own GIL

  • Shared immutable data structures enable efficient data sharing between subinterpreters without copying or pickling overhead

  • MemHive library implements efficient immutable data structures using HAMT (Hash Array Mapped Trie) algorithm with O(log n) complexity

  • Structured sharing enables updating immutable collections by only copying changed nodes in the tree structure, reusing unchanged parts

  • For collections with millions of elements, structured sharing is significantly faster than pickling when passing data between subinterpreters

  • The architecture uses three levels:

    • Level 1: Basic functions
    • Level 2: Queues and synchronization primitives
    • Level 3: AsyncIO bridge
  • Immutable collections can be safely accessed across subinterpreters without locks since they cannot be modified

  • The implementation uses trees behind the scenes but exposes a simple dict-like API to users

  • Performance scales well - adding single items to large collections only requires copying a few nodes rather than the entire structure

  • While still in prototype stage, the approach shows promise for CPU-intensive Python applications needing true parallelism