Numba
0.48
Site
1. User Manual
2. Reference Manual
3. Numba for CUDA GPUs
4. CUDA Python Reference
5. Numba for AMD ROC GPUs
6. Extending Numba
7. Developer Manual
8. Numba Enhancement Proposals
9. Glossary
10. Release Notes
Page
3. Numba for CUDA GPUs
« 2.11. Depreca...
3.1. Overview »
Source
3. Numba for CUDA GPUs
¶
3.1. Overview
3.1.1. Terminology
3.1.2. Programming model
3.1.3. Requirements
3.1.3.1. Supported GPUs
3.1.3.2. Software
3.1.3.2.1. Setting CUDA Installation Path
3.1.4. Missing CUDA Features
3.2. Writing CUDA Kernels
3.2.1. Introduction
3.2.2. Kernel declaration
3.2.3. Kernel invocation
3.2.3.1. Choosing the block size
3.2.3.2. Multi-dimensional blocks and grids
3.2.4. Thread positioning
3.2.4.1. Absolute positions
3.2.4.2. Further Reading
3.3. Memory management
3.3.1. Data transfer
3.3.1.1. Device arrays
3.3.2. Pinned memory
3.3.3. Streams
3.3.4. Shared memory and thread synchronization
3.3.5. Local memory
3.3.6. Constant memory
3.3.7. Deallocation Behavior
3.4. Writing Device Functions
3.5. Supported Python features in CUDA Python
3.5.1. Language
3.5.1.1. Execution Model
3.5.1.2. Constructs
3.5.2. Built-in types
3.5.3. Built-in functions
3.5.4. Standard library modules
3.5.4.1.
cmath
3.5.4.2.
math
3.5.4.3.
operator
3.5.5. Numpy support
3.6. Supported Atomic Operations
3.6.1. Example
3.7. Random Number Generation
3.7.1. Example
3.8. Device management
3.8.1. Device Selection
3.9. The Device List
3.10. Examples
3.10.1. Matrix multiplication
3.11. Debugging CUDA Python with the the CUDA Simulator
3.11.1. Using the simulator
3.11.2. Supported features
3.12. GPU Reduction
3.12.1.
@reduce
3.12.2. class Reduce
3.13. CUDA Ufuncs and Generalized Ufuncs
3.13.1. Example: Basic Example
3.13.2. Example: Calling Device Functions
3.13.3. Generalized CUDA ufuncs
3.14. Sharing CUDA Memory
3.14.1. Sharing between process
3.14.1.1. Export device array to another process
3.14.1.2. Import IPC memory from another process
3.15. CUDA Array Interface (Version 2)
3.15.1. Python Interface Specification
3.15.1.1. Differences with CUDA Array Interface (Version 0)
3.15.1.2. Differences with CUDA Array Interface (Version 1)
3.16. CUDA Frequently Asked Questions
3.16.1. nvprof reports “No kernels were profiled”