Talks - Irit Katriel: CPython's Compilation Pipeline

Learn how CPython 3.13's compilation pipeline evolves with new stages between AST and bytecode, making the compiler more modular, testable, and maintainable. A deep dive by Irit Katriel.

Key takeaways
  • Python’s compilation pipeline in 3.13 introduces new stages between AST and bytecode generation, improving modularity and testability

  • The main compilation stages are:

    • Tokenizer (converts source code to tokens)
    • Parser (builds AST)
    • AST Optimizer
    • Code Generation (produces instruction sequence)
    • Peephole Optimizer (optimizes pseudocode)
    • Assembler (creates final bytecode)
  • The refactoring was primarily motivated by:

    • Need for better unit testing capabilities
    • Improving code maintainability
    • Making the compiler more modular and accessible
  • New pseudo-instructions were introduced as an intermediate representation between AST and bytecode, providing better abstraction

  • The changes make the compiler more flexible for:

    • Writing targeted unit tests for specific optimizations
    • Hooking in alternative compiler implementations
    • Customizing individual compilation stages
  • Key improvements for bytecode handling:

    • Better handling of jump instructions
    • Cleaner separation between logical jumps and physical offsets
    • Improved optimization passes
  • Testing capabilities are exposed through _test_internal_capi module, though not yet officially part of the standard library

  • The changes simplified compile.c, which was previously one of the largest handwritten files in CPython

  • Future possibilities include exposing compilation stages through the standard library if compelling use cases emerge

  • The refactoring helps make CPython more accessible to contributors by breaking down complex compilation steps