Numba Logo
0.50

For all users

  • User Manual
    • A ~5 minute guide to Numba
      • How do I get it?
      • Will Numba work for my code?
      • What is nopython mode?
      • How to measure the performance of Numba?
      • How fast is it?
      • How does Numba work?
      • Other things of interest:
        • GPU targets:
    • Overview
    • Installation
      • Compatibility
      • Installing using conda on x86/x86_64/POWER Platforms
      • Installing using pip on x86/x86_64 Platforms
      • Enabling AMD ROCm GPU Support
      • Installing on Linux ARMv7 Platforms
      • Installing on Linux ARMv8 (AArch64) Platforms
      • Installing from source
      • Dependency List
      • Checking your installation
    • Compiling Python code with @jit
      • Basic usage
        • Lazy compilation
        • Eager compilation
      • Calling and inlining other functions
      • Signature specifications
      • Compilation options
        • nopython
        • nogil
        • cache
        • parallel
    • Flexible specializations with @generated_jit
      • Example
      • Compilation options
    • Creating NumPy universal functions
      • The @vectorize decorator
      • The @guvectorize decorator
        • Overwriting input values
      • Dynamic universal functions
    • Compiling Python classes with @jitclass
      • Basic usage
      • Specifying numba.typed containers as class members
      • Support operations
      • Limitations
      • The decorator: @jitclass
    • Creating C callbacks with @cfunc
      • Basic usage
      • Example
      • Dealing with pointers and array memory
      • Handling C structures
        • With CFFI
        • With numba.types.Record.make_c_struct
        • Full example
      • Signature specification
      • Compilation options
    • Compiling code ahead of time
      • Overview
        • Benefits
        • Limitations
      • Usage
        • Standalone example
        • Distutils integration
        • Signature syntax
    • Automatic parallelization with @jit
      • Supported Operations
      • Explicit Parallel Loops
      • Examples
      • Diagnostics
        • The parallel diagnostics report sections
    • Using the @stencil decorator
      • Basic usage
      • Stencil Parameters
      • Kernel shape inference and border handling
      • Stencil decorator options
        • neighborhood
        • func_or_mode
        • cval
        • standard_indexing
      • StencilFunc
      • Stencil invocation options
        • out
    • Callback into the Python Interpreter from within JIT’ed code
      • The objmode context-manager
    • Automatic module jitting with jit_module
      • Example usage
      • API
    • Performance Tips
      • No Python mode vs Object mode
      • Loops
      • Fastmath
      • Parallel=True
      • Intel SVML
      • Linear algebra
    • The Threading Layers
      • Which threading layers are available?
      • Setting the threading layer
        • Selecting a threading layer for safe parallel execution
        • Selecting a named threading layer
      • Extra notes
      • Setting the Number of Threads
        • Example of Limiting the Number of Threads
        • API Reference
    • Command line interface
      • Usage
      • Help
      • System information
      • Debugging
    • Troubleshooting and tips
      • What to compile
      • My code doesn’t compile
      • My code has a type unification problem
      • My code has an untyped list problem
      • The compiled code is too slow
      • Disabling JIT compilation
      • Debugging JIT compiled code with GDB
        • Example debug usage
        • Globally override debug setting
      • Using Numba’s direct gdb bindings in nopython mode
        • Set up
        • Basic gdb support
        • Running with gdb enabled
        • Adding breakpoints to code
        • Debugging in parallel regions
        • Using the gdb command language
        • How does the gdb binding work?
      • Debugging CUDA Python code
        • Using the simulator
        • Debug Info
    • Frequently Asked Questions
      • Programming
        • Can I pass a function as an argument to a jitted function?
        • Numba doesn’t seem to care when I modify a global variable
        • Can I debug a jitted function?
        • How can I create a Fortran-ordered array?
        • How can I increase integer width?
        • How can I tell if parallel=True worked?
      • Performance
        • Does Numba inline functions?
        • Does Numba vectorize array computations (SIMD)?
        • Why my loop is not vectorized?
        • Does Numba automatically parallelize code?
        • Can Numba speed up short-running functions?
        • There is a delay when JIT-compiling a complicated function, how can I improve it?
      • GPU Programming
        • How do I work around the CUDA intialized before forking error?
      • Integration with other utilities
        • Can I “freeze” an application which uses Numba?
        • I get errors when running a script twice under Spyder
        • Why does Numba complain about the current locale?
      • Miscellaneous
        • Where does the project name “Numba” come from?
        • How do I reference/cite/acknowledge Numba in other work?
        • Other related papers
    • Examples
      • Mandelbrot
      • Moving average
      • Multi-threading
    • Talks and Tutorials
      • Talks on Numba
      • Talks on Applications of Numba
      • Tutorials
  • Reference Manual
    • Types and signatures
      • Rationale
      • Signatures
      • Basic types
        • Numbers
        • Arrays
        • Functions
        • Miscellaneous Types
      • Advanced types
        • Inference
        • Numpy scalars
        • Arrays
        • Optional types
    • Just-in-Time compilation
      • JIT functions
      • Generated JIT functions
      • Dispatcher objects
      • Vectorized functions (ufuncs and DUFuncs)
      • C callbacks
    • Ahead-of-Time compilation
    • Utilities
      • Dealing with pointers
    • Environment variables
      • Jit flags
      • Debugging
      • Compilation options
      • Caching options
      • GPU support
      • Threading Control
    • Supported Python features
      • Language
        • Constructs
        • Functions
        • Generators
        • Exception handling
      • Built-in types
        • int, bool
        • float, complex
        • str
        • tuple
        • homogeneous tuples
        • heterogeneous tuples
        • list
        • set
        • None
        • bytes, bytearray, memoryview
      • Built-in functions
        • Hashing
      • Standard library modules
        • array
        • cmath
        • collections
        • ctypes
        • enum
        • math
        • operator
        • functools
        • random
        • heapq
      • Third-party modules
        • cffi
    • Supported NumPy features
      • Scalar types
      • Array types
        • Array access
        • Attributes
        • Calculation
        • Other methods
      • Functions
        • Linear algebra
        • Reductions
        • Other functions
        • Literal arrays
      • Modules
        • random
        • stride_tricks
      • Standard ufuncs
        • Limitations
        • Math operations
        • Trigonometric functions
        • Bit-twiddling functions
        • Comparison functions
        • Floating functions
        • Datetime functions
    • Deviations from Python Semantics
      • Exceptions and Memory Allocation
      • Integer width
      • Boolean inversion
      • Global and closure variables
    • Floating-point pitfalls
      • Precision and accuracy
        • Math library implementations
        • Linear algebra
        • Mixed-types operations
      • Warnings and errors
    • Python 2.7 End of Life Plan
      • Timeline
    • Deprecation Notices
      • Suppressing Deprecation warnings
      • Deprecation of reflection for List and Set types
        • Reason for deprecation
        • Example(s) of the impact
        • Schedule
        • Recommendations
        • Expected Replacement
      • Deprecation of object mode fall-back behaviour when using @jit
        • Reason for deprecation
        • Example(s) of the impact
        • Schedule
        • Recommendations
      • Change of jitclass location
        • Example(s) of the impact
        • Recommendations
        • Schedule

For CUDA users

  • Numba for CUDA GPUs
    • Overview
      • Terminology
      • Programming model
      • Requirements
        • Supported GPUs
        • Software
      • Missing CUDA Features
    • Writing CUDA Kernels
      • Introduction
      • Kernel declaration
      • Kernel invocation
        • Choosing the block size
        • Multi-dimensional blocks and grids
      • Thread positioning
        • Absolute positions
        • Further Reading
    • Memory management
      • Data transfer
        • Device arrays
      • Pinned memory
      • Streams
      • Shared memory and thread synchronization
      • Local memory
      • Constant memory
      • Deallocation Behavior
    • Writing Device Functions
    • Supported Python features in CUDA Python
      • Language
        • Execution Model
        • Constructs
      • Built-in types
      • Built-in functions
      • Standard library modules
        • cmath
        • math
        • operator
      • Numpy support
    • Supported Atomic Operations
      • Example
    • Random Number Generation
      • Example
    • Device management
      • Device Selection
    • The Device List
    • Examples
      • Matrix multiplication
    • Debugging CUDA Python with the the CUDA Simulator
      • Using the simulator
      • Supported features
    • GPU Reduction
      • @reduce
      • class Reduce
    • CUDA Ufuncs and Generalized Ufuncs
      • Example: Basic Example
      • Example: Calling Device Functions
      • Generalized CUDA ufuncs
    • Sharing CUDA Memory
      • Sharing between process
        • Export device array to another process
        • Import IPC memory from another process
    • CUDA Array Interface (Version 2)
      • Python Interface Specification
        • Lifetime management
        • Lifetime management in Numba
        • Pointer Attributes
        • Differences with CUDA Array Interface (Version 0)
        • Differences with CUDA Array Interface (Version 1)
        • Interoperability
    • External Memory Management (EMM) Plugin interface
      • Overview of External Memory Management
        • Effects on Deallocation Strategies
        • Management of other objects
        • Asynchronous allocation and deallocation
      • Implementing an EMM Plugin
        • The Host-Only CUDA Memory Manager
      • Classes and structures of returned objects
        • Memory Pointers
        • Memory Info
        • IPC
      • Setting the EMM Plugin
        • Environment variable
        • Function
    • CUDA Frequently Asked Questions
      • nvprof reports “No kernels were profiled”
  • CUDA Python Reference
    • CUDA Host API
      • Device Management
        • Device detection and enquiry
        • Context management
        • Device management
      • Compilation
      • Measurement
        • Profiling
        • Events
      • Stream Management
    • CUDA Kernel API
      • Kernel declaration
      • Intrinsic Attributes and Functions
        • Thread Indexing
        • Memory Management
        • Synchronization and Atomic Operations
        • Memory Fences
        • Warp Intrinsics
        • Integer Intrinsics
        • Floating Point Intrinsics
        • Control Flow Instructions
    • Memory Management
      • Device Objects

For ROCm users

  • Numba for AMD ROC GPUs
    • Overview
      • Terminology
      • Requirements
      • Installation
    • Writing HSA Kernels
      • Introduction
      • Introduction for CUDA Programmers
      • Kernel declaration
      • Kernel invocation
        • Choosing the workgroup size
        • Multi-dimensional workgroup and grid
      • WorkItem positioning
    • Memory management
      • Data transfer
        • Device arrays
        • Data Registration
      • Streams
      • Shared memory and thread synchronization
    • Writing Device Functions
    • Supported Atomic Operations
      • Example
    • The Agents
    • ROC Ufuncs and Generalized Ufuncs
      • Basic ROC UFunc Example
      • Calling Device Functions from ROC UFuncs
      • Generalized ROC ufuncs
      • Async execution: A Chunk at a Time
    • Examples
      • Matrix multiplication

For advanced users & developers

  • Extending Numba
    • High-level extension API
      • Implementing functions
      • Implementing methods
      • Implementing attributes
      • Importing Cython Functions
      • Implementing intrinsics
    • Low-level extension API
      • Typing
      • Lowering
        • Native operations
        • Constants
        • Boxing and unboxing
    • Example: an interval type
      • Extending the typing layer
        • Creating a new Numba type
        • Type inference for Python values
        • Type inference for operations
      • Extending the lowering layer
        • Defining the data model for native intervals
        • Exposing data model attributes
        • Exposing a property
        • Implementing the constructor
        • Boxing and unboxing
      • Using it
      • Conclusion
    • A guide to using @overload
      • Concrete Example
      • Implementing @overload for NumPy functions
    • Registering Extensions with Entry Points
      • Adding Support for the “Init” Entry Point
      • Testing your Entry Point
  • Developer Manual
    • Contributing to Numba
      • Communication
        • Mailing-list
        • Real-time Chat
        • Weekly Meetings
        • Bug tracker
      • Getting set up
        • Build environment
        • Building Numba
        • Running tests
      • Development rules
        • Code reviews
        • Coding conventions
        • Stability
        • Platform support
      • Documentation
        • Main documentation
        • Web site homepage
    • A Map of the Numba Repository
      • Support Files
        • Build and Packaging
        • Continuous Integration
        • Documentation
      • Numba Source Code
        • Public API
        • Dispatching
        • Compiler Pipeline
        • Type Management
        • Compiled Extensions
        • Misc Support
        • Core Python Data Types
        • Math
        • ParallelAccelerator
        • Stencil
        • Debugging Support
        • Type Signatures (CPU)
        • Target Implementations (CPU)
        • Ufunc Compiler and Runtime
        • Unit Tests (CPU)
        • Command Line Utilities
        • CUDA GPU Target
        • ROCm GPU Target
    • Numba architecture
      • Introduction
      • Compiler architecture
      • Contexts
      • Compiler stages
        • Stage 1: Analyze bytecode
        • Stage 2: Generate the Numba IR
        • Stage 3: Macro expansion
        • Stage 4: Rewrite untyped IR
        • Stage 5: Infer types
        • Stage 6a: Rewrite typed IR
        • Stage 6b: Perform Automatic Parallelization
        • Stage 7a: Generate nopython LLVM IR
        • Stage 7b: Generate object mode LLVM IR
        • Stage 8: Compile LLVM IR to machine code
    • Polymorphic dispatching
      • Requirements
        • Compile-time vs. run-time
      • Type resolution
        • Typecodes
        • Hard-coded fast paths
        • Fingerprint-based typecode cache
        • Summary
      • Specialization selection
        • Implicit conversion rules
        • Candidates and best match
        • Implementation
        • Summary
      • Miscellaneous
    • Notes on generators
      • Terminology
      • Function analysis
      • The generator structure
        • Layout
        • Allocation
      • Compiling to native code
        • The next() function
    • Notes on Numba Runtime
      • Memory Management
        • Cooperating with CPython
        • Compiler-side Cooperation
        • Optimizations
        • Quirks
        • Debugging Leaks
        • Debugging Leaks in C
      • Recursion Support
      • Using the NRT from C code
      • Future Plan
    • Using the Numba Rewrite Pass for Fun and Optimization
      • Overview
      • Rewriting Passes
        • The Rewrite Base Class
        • Subclassing Rewrite
        • Overloading Rewrite.match()
        • Overloading Rewrite.apply()
      • The Rewrite Registry
      • Case study: Array Expressions
        • The RewriteArrayExprs.match() method
        • The RewriteArrayExprs.apply() method
        • The _lower_array_expr() function
      • Conclusions and Caveats
    • Live Variable Analysis
      • Notes on behavior of the live variable analysis
        • Variable deleted before definition
    • Listings
      • New style listings
        • Listings for builtins
        • Listings for math
        • Listings for cmath
        • Listings for numpy
      • Old style listings
        • Lowering Listing
    • Notes on stencils
      • The stencil decorator
      • Handling the three modes
        • Outside jit context
        • Jit without parallel=True
        • Jit with parallel=True
      • Creating the stencil function
      • Exceptions raised
    • Customizing the Compiler
      • Implementing a compiler pass
        • Compiler pass classes
      • Debugging compiler passes
        • Observing IR Changes
        • Pass execution times
    • Notes on Inlining
      • Example using numba.jit()
      • Example using numba.extending.overload()
      • Using a function to limit the inlining depth of a recursive function
    • Environment Object
      • The Implementation
        • Serialization
        • Usage
    • Notes on Hashing
      • The Implementation
        • Unicode hash cache differences
        • The accommodation of PYTHONHASHSEED
    • Notes on Caching
      • The Implementation
        • Requirements for Cacheability
        • Features Compatible with Caching
        • Caching Limitations
        • Cache Sharing
        • Cache Clearing
        • Related Environment Variables
    • Notes on Numba’s threading implementation
      • Thread masking
        • Programming model
        • The Implementation
        • Caveats
        • Use in Code Generation
    • Notes on Literal Types
      • Literal Type
      • Specifying for Literal Typing
        • Code Example
        • Internal Details
      • Inside Extensions
    • Notes on Debugging
      • Memcheck
    • Numba Project Roadmap
      • Short Term: 2019H1
      • Medium Term: 2019H2
      • Long Term: 2020 and beyond
  • Numba Enhancement Proposals
    • Implemented proposals
      • NBEP 1: Changes in integer typing
        • Current semantics
        • Proposal: predictable width-conserving typing
        • Proposal impact
      • NBEP 7: CUDA External Memory Management Plugins
        • Background and goals
        • Requirements
        • Interface for Plugin developers
        • Example implementation - A RAPIDS Memory Manager (RMM) Plugin
        • Numba internal changes
        • Prototyping / experimental implementation
    • Other proposals
      • NBEP 2: Extension points
        • High-level API
        • Typing
        • Code generation
      • NBEP 3: JIT Classes
        • Introduction
        • Proposal: jit-classes
      • NBEP 4: Defining C callbacks
        • Basic usage
        • Passing array data
        • Error handling
        • Deferred topics
      • NBEP 5: Type Inference
        • Introduction
        • Numba Type Semantic
        • Type Inference
      • NBEP 6: Typing Recursion
        • Introduction
        • The Current State
        • The Solution
        • Limitations
  • Glossary
  • Release Notes
    • Version 0.50.0 (Jun 10, 2020)
    • Version 0.49.1 (May 7, 2020)
    • Version 0.49.0 (Apr 16, 2020)
    • Version 0.48.0 (Jan 27, 2020)
    • Version 0.47.0 (Jan 2, 2020)
    • Version 0.46.0
    • Version 0.45.1
    • Version 0.45.0
    • Version 0.44.1
    • Version 0.44.0
    • Version 0.43.1
    • Version 0.43.0
    • Version 0.42.1
    • Version 0.42.0
    • Version 0.41.0
    • Version 0.40.1
    • Version 0.40.0
    • Version 0.39.0
    • Version 0.38.1
    • Version 0.38.0
    • Version 0.37.0
    • Version 0.36.2
    • Version 0.36.1
    • Version 0.35.0
    • Version 0.34.0
    • Version 0.33.0
    • Version 0.32.0
    • Version 0.31.0
    • Version 0.30.1
    • Version 0.30.0
    • Version 0.29.0
    • Version 0.28.1
    • Version 0.28.0
    • Version 0.27.0
    • Version 0.26.0
    • Version 0.25.0
    • Version 0.24.0
    • Version 0.23.1
    • Version 0.23.0
    • Version 0.22.1
    • Version 0.22.0
    • Version 0.21.0
    • Version 0.20.0
    • Version 0.19.2
    • Version 0.19.1
    • Version 0.19.0
    • Version 0.18.2
    • Version 0.18.1
    • Version 0.17.0
    • Version 0.16.0
    • Version 0.15.1
    • Version 0.15
    • Version 0.14
    • Version 0.13.4
    • Version 0.13.3
    • Version 0.13.2
    • Version 0.13.1
    • Version 0.13
    • Version 0.12.2
    • Version 0.12.1
    • Version 0.12
    • Version 0.11
    • Version 0.10
    • Version 0.9
    • Version 0.8
    • Version 0.7.2
    • Version 0.7.1
    • Version 0.7
    • Version 0.6.1
    • Version 0.6
    • Version 0.5
    • Version 0.4
    • Version 0.3.2
    • Version 0.3
    • Version 0.2
Numba
  • Docs »
  • CUDA Python Reference
  • View page source

OUTDATED DOCUMENTATION

You are viewing archived documentation from the old Numba documentation site. The current documentation is located at https://numba.readthedocs.io.

CUDA Python Reference¶

  • CUDA Host API
    • Device Management
      • Device detection and enquiry
      • Context management
      • Device management
    • Compilation
    • Measurement
      • Profiling
      • Events
    • Stream Management
  • CUDA Kernel API
    • Kernel declaration
    • Intrinsic Attributes and Functions
      • Thread Indexing
      • Memory Management
      • Synchronization and Atomic Operations
      • Memory Fences
      • Warp Intrinsics
      • Integer Intrinsics
      • Floating Point Intrinsics
      • Control Flow Instructions
  • Memory Management
    • Device Objects
Next Previous

© Copyright 2012-2020, Anaconda, Inc. and others

Built with Sphinx using a theme provided by Read the Docs.