API that are reported to numba.cuda
Explicitly closes the context.
Destroy the current context of the current thread
Detect hardware support
Allocate an empty device ndarray. Similar to numpy.empty()
Call cuda.devicearray() with information from the array.
Create a CUDA event.
Get current device associated with the current thread
List all CUDA devices
A context manager for temporarily mapping a sequence of host ndarrays.
Allocate a mapped ndarray with a buffer that is pinned and mapped on to the device. Similar to numpy.empty()
Parameters: |
|
---|
A context manager for temporary pinning a sequence of host ndarrays.
Allocate a numpy.ndarray with a buffer that is pinned (pagelocked). Similar to numpy.empty().
Creates a new CUDA context with the selected device. The context is associated with the current thread. NumbaPro currently allows only one context per thread.
Returns a device instance
Raises exception on error.
Create a CUDA stream that represents a command queue for the device.
Synchronize current context
Allocate and transfer a numpy ndarray to the device.
To copy host->device a numpy array:
ary = numpy.arange(10)
d_ary = cuda.to_device(ary)
To enqueue the transfer to a stream:
stream = cuda.stream()
d_ary = cuda.to_device(ary, stream=stream)
The resulting d_ary is a DeviceNDArray.
To copy device->host:
hary = d_ary.copy_to_host()
To copy device->host to an existing array:
ary = numpy.empty(shape=d_ary.shape, dtype=d_ary.dtype)
d_ary.copy_to_host(ary)
To enqueue the transfer to a stream:
hary = d_ary.copy_to_host(stream=stream)
Bases: numba.cuda.compiler.CUDAKernelBase
Bases: numba.cuda.compiler.CUDAKernelBase
Force binding to current CUDA context
Get current active context
Bases: object
Define interface for configurable kernels
Bases: object
Get or compile CUDA function for the current active context
Uses device ID as key for cache.
Bases: object
A PTX cache that uses compute capability as a cache key
Get PTX for the current active context.
Bases: _ctypes.Structure
Bases: numba.cuda.compiler.Complex
Structure/Union member
Structure/Union member
Bases: numba.cuda.compiler.Complex
Structure/Union member
Structure/Union member
Bases: object
Bases: object
Bases: numba.typing.templates.AttributeTemplate
Bases: numba.typing.templates.AttributeTemplate
Bases: numba.typing.templates.AttributeTemplate
Bases: numba.typing.templates.AttributeTemplate
Bases: numba.typing.templates.AbstractTemplate
alias of add
Bases: numba.typing.templates.AttributeTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.AttributeTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.AttributeTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.ConcreteTemplate
alias of syncthreads
Bases: numba.typing.templates.AttributeTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.MacroTemplate
Bases: numba.typing.templates.AttributeTemplate
Bases: numba.cuda.cudamath.Math_unary
acos(x)
Return the arc cosine (measured in radians) of x.
Bases: numba.cuda.cudamath.Math_unary
acosh(x)
Return the hyperbolic arc cosine (measured in radians) of x.
Bases: numba.cuda.cudamath.Math_unary
asin(x)
Return the arc sine (measured in radians) of x.
Bases: numba.cuda.cudamath.Math_unary
asinh(x)
Return the hyperbolic arc sine (measured in radians) of x.
Bases: numba.cuda.cudamath.Math_unary
atan(x)
Return the arc tangent (measured in radians) of x.
Bases: numba.typing.templates.ConcreteTemplate
atan2(y, x)
Return the arc tangent (measured in radians) of y/x. Unlike atan(y/x), the signs of both x and y are considered.
Bases: numba.cuda.cudamath.Math_unary
atanh(x)
Return the hyperbolic arc tangent (measured in radians) of x.
Bases: numba.typing.templates.ConcreteTemplate
Bases: numba.cuda.cudamath.Math_unary
ceil(x)
Return the ceiling of x as a float. This is the smallest integral value >= x.
Bases: numba.cuda.cudamath.Math_binary
copysign(x, y)
Return x with the sign of y.
Bases: numba.cuda.cudamath.Math_unary
cos(x)
Return the cosine of x (measured in radians).
Bases: numba.cuda.cudamath.Math_unary
cosh(x)
Return the hyperbolic cosine of x.
Bases: numba.cuda.cudamath.Math_unary
degrees(x)
Convert angle x from radians to degrees.
Bases: numba.cuda.cudamath.Math_unary
exp(x)
Return e raised to the power of x.
Bases: numba.cuda.cudamath.Math_unary
expm1(x)
Return exp(x)-1. This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.
Bases: numba.cuda.cudamath.Math_unary
fabs(x)
Return the absolute value of the float x.
Bases: numba.cuda.cudamath.Math_unary
floor(x)
Return the floor of x as a float. This is the largest integral value <= x.
Bases: numba.cuda.cudamath.Math_binary
fmod(x, y)
Return fmod(x, y), according to platform C. x % y may differ.
Bases: numba.typing.templates.ConcreteTemplate
isinf(x) -> bool
Check if float x is infinite (positive or negative).
Bases: numba.typing.templates.ConcreteTemplate
isnan(x) -> bool
Check if float x is not a number (NaN).
Bases: numba.cuda.cudamath.Math_unary
log(x[, base])
Return the logarithm of x to the given base. If the base not specified, returns the natural logarithm (base e) of x.
Bases: numba.cuda.cudamath.Math_unary
log10(x)
Return the base 10 logarithm of x.
Bases: numba.cuda.cudamath.Math_unary
log1p(x)
Return the natural logarithm of 1+x (base e). The result is computed in a way which is accurate for x near zero.
Bases: numba.typing.templates.ConcreteTemplate
pow(x, y)
Return x**y (x to the power of y).
Bases: numba.cuda.cudamath.Math_unary
radians(x)
Convert angle x from degrees to radians.
Bases: numba.cuda.cudamath.Math_unary
sin(x)
Return the sine of x (measured in radians).
Bases: numba.cuda.cudamath.Math_unary
sinh(x)
Return the hyperbolic sine of x.
Bases: numba.cuda.cudamath.Math_unary
sqrt(x)
Return the square root of x.
Bases: numba.cuda.cudamath.Math_unary
tan(x)
Return the tangent of x (measured in radians).
Bases: numba.cuda.cudamath.Math_unary
tanh(x)
Return the hyperbolic tangent of x.
Bases: numba.cuda.cudamath.Math_unary
trunc(x:Real) -> Integral
Truncates x to the nearest Integral toward 0. Uses the __trunc__ magic method.
Bases: numba.typing.templates.ConcreteTemplate
JIT at callsite. Function signature is not needed as this will capture the type at call time. Each signature of the kernel is cached for future use.
Note
Can only compile CUDA kernel.
Example:
import numpy
@cuda.autojit
def foo(aryA, aryB):
...
aryA = numpy.arange(10, dtype=np.int32)
aryB = numpy.arange(10, dtype=np.float32)
foo[griddim, blockdim](aryA, aryB)
In the above code, a version of foo with the signature “void(int32[:], float32[:])” is compiled.
JIT compile a python function conforming to the CUDA-Python specification.
To define a CUDA kernel that takes two int 1D-arrays:
@cuda.jit('void(int32[:], int32[:])')
def foo(aryA, aryB):
...
Note
A kernel cannot have any return value.
To launch the cuda kernel:
griddim = 1, 2
blockdim = 3, 4
foo[griddim, blockdim](aryA, aryB)
griddim is the number of thread-block per grid. It can be:
blockdim is the number of threads per block. It can be:
The above code is equaivalent to the following CUDA-C.
dim3 griddim(1, 2);
dim3 blockdim(3, 4);
foo<<<griddim, blockdim>>>(aryA, aryB);
To access the compiled PTX code:
print foo.ptx
To define a CUDA device function that takes two ints and returns a int:
@cuda.jit('int32(int32, int32)', device=True)
def bar(a, b):
...
To force inline the device function:
@cuda.jit('int32(int32, int32)', device=True, inline=True)
def bar_forced_inline(a, b):
...
A device function can only be used inside another kernel. It cannot be called from the host.
Using bar in a CUDA kernel:
@cuda.jit('void(int32[:], int32[:], int32[:])')
def use_bar(aryA, aryB, aryOut):
i = cuda.grid(1) # global position of the thread for a 1D grid.
aryOut[i] = bar(aryA[i], aryB[i])
Bases: numba.targets.options.TargetOptions
Bases: numba.targets.descriptors.TargetDescriptor
alias of CPUTargetOptions
Bases: object
Disable the compilation of new signatures at call time.
alias of CUDATarget
Bases: numba.targets.descriptors.TargetDescriptor
alias of CUDATargetOptions
Bases: numba.targets.options.TargetOptions
Bases: exceptions.RuntimeError
Bases: object
This scripts specifies all PTX special objects.
Bases: object
A stub object to represent special objects which is meaningless outside the context of CUDA-python.
Bases: numba.cuda.stubs.Stub
atomic namespace
Bases: numba.cuda.stubs.Stub
add(ary, idx, val)
Perform atomic ary[idx] += val
Bases: numba.cuda.stubs.Stub
blockDim.{x, y, z}
Bases: numba.cuda.stubs.Stub
blockIdx.{x, y}
Bases: numba.cuda.stubs.Stub
shared namespace
Bases: numba.cuda.stubs.Stub
gridDim.{x, y}
grid(ndim)
ndim: [int] 1 or 2
- if ndim == 1:
- return cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
- elif ndim == 2:
- x = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x y = cuda.threadIdx.y + cuda.blockIdx.y * cuda.blockDim.y return x, y
gridsize(ndim)
ndim: [int] 1 or 2
- if ndim == 1:
- return cuda.blockDim.x * cuda.gridDim.x
- elif ndim == 2:
- x = cuda.blockDim.x * cuda.gridDim.x y = cuda.blockDim.y * cuda.gridDim.y return x, y
Bases: numba.cuda.stubs.Stub
shared namespace
Bases: numba.cuda.stubs.Stub
shared namespace
Bases: numba.cuda.stubs.Stub
syncthreads()
Synchronizes all threads in the thread block.
Bases: numba.cuda.stubs.Stub
threadIdx.{x, y, z}
Bases: numba.targets.base.BaseContext
Insert a constant string in the constant addresspace and return a generic i8 pointer to the data.
This function attempts to deduplicate.
Return dummy value.
XXX: We should be able to move cuda.const.array_like into here.
Run O1 function passes
Bases: numba.typing.context.BaseContext