Numba
0.36
Site
1. User Manual
2. Reference Manual
3. Numba for CUDA GPUs
4. CUDA Python Reference
5. Numba for HSA APUs
6. Extending Numba
7. Developer Manual
8. Numba Enhancement Proposals
9. Glossary
10. Release Notes
Page
3. Numba for CUDA GPUs
« 2.9. Floating...
3.1. Overview »
Source
3. Numba for CUDA GPUs
ΒΆ
3.1. Overview
3.1.1. Terminology
3.1.2. Programming model
3.1.3. Requirements
3.1.3.1. Supported GPUs
3.1.3.2. Software
3.1.4. Missing CUDA Features
3.2. Writing CUDA Kernels
3.2.1. Introduction
3.2.2. Kernel declaration
3.2.3. Kernel invocation
3.2.3.1. Choosing the block size
3.2.3.2. Multi-dimensional blocks and grids
3.2.4. Thread positioning
3.2.4.1. Absolute positions
3.2.4.2. Further Reading
3.3. Memory management
3.3.1. Data transfer
3.3.1.1. Device arrays
3.3.2. Pinned memory
3.3.3. Streams
3.3.4. Shared memory and thread synchronization
3.3.5. Local memory
3.3.6. SmartArrays (experimental)
3.3.7. Deallocation Behavior
3.4. Writing Device Functions
3.5. Supported Python features in CUDA Python
3.5.1. Language
3.5.1.1. Execution Model
3.5.1.2. Constructs
3.5.2. Built-in types
3.5.3. Built-in functions
3.5.4. Standard library modules
3.5.4.1.
cmath
3.5.4.2.
math
3.5.4.3.
operator
3.6. Supported Atomic Operations
3.6.1. Example
3.7. Random Number Generation
3.7.1. Example
3.8. Device management
3.8.1. Device Selection
3.9. The Device List
3.10. Examples
3.10.1. Matrix multiplication
3.11. Debugging CUDA Python with the the CUDA Simulator
3.11.1. Using the simulator
3.11.2. Supported features
3.12. GPU Reduction
3.12.1.
@reduce
3.12.2. class Reduce
3.13. CUDA Ufuncs and Generalized Ufuncs
3.13.1. Example: Basic Example
3.13.2. Example: Calling Device Functions
3.13.3. Generalized CUDA ufuncs
3.14. Sharing CUDA Memory
3.14.1. Sharing between process
3.14.1.1. Export device array to another process
3.14.1.2. Import IPC memory from another process
3.15. CUDA Frequently Asked Questions
3.15.1. nvprof reports “No kernels were profiled”