We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Braxton Cuneo - Injecting Python Functions into a Template-Driven CUDA C++ Framework | SciPy 2024
Learn how to inject Python functions into CUDA C++ code while abstracting GPU complexity. See real examples of bridging languages via templates and FFI for scientific computing.
-
Framework allows injecting Python functions into CUDA C++ code while abstracting away GPU complexity from nuclear scientists and domain experts
-
Utilizes templates and FFI (Foreign Function Interface) to bridge Python and C++, with Harmonize serving as middleware between MCDC (Python framework) and CUDA
-
Asynchronous programming model where calls are not immediately executed but can be scheduled and potentially run on different hardware
-
System handles memory management, divergence reduction, and GPU-specific optimizations automatically so scientists can focus on physics/algorithms
-
Special handling required for data types and alignment issues when working between Python/Numba and CUDA:
- Cannot use nested records directly
- Must handle zero-size arrays carefully
- Need proper alignment for struct members
-
Provides automatic management for:
- Device memory allocation
- Data movement between CPU/GPU
- Work scheduling
- Thread coordination
-
Performance optimizations include:
- Shared memory usage
- Thread divergence reduction
- Load balancing
- Locality optimization
-
Framework is generic and could potentially support:
- AMD GPUs (planned)
- Other language bindings
- Different backend runtimes
-
Open source implementation available with automated tooling to handle complex linking and compilation steps
-
Particularly useful for Monte Carlo simulations requiring many parallel computations, like neutron transport problems