numba.cuda package

Submodules

numba.cuda.api module

API that are reported to numba.cuda

numba.cuda.api.close()

Explicitly closes the context.

Destroy the current context of the current thread

numba.cuda.api.defer_cleanup(*args, **kwds)
numba.cuda.api.detect()

Detect hardware support

numba.cuda.api.device_array(shape, dtype=np.float, strides=None, order='C', stream=0)

Allocate an empty device ndarray. Similar to numpy.empty()

numba.cuda.api.device_array_like(ary, stream=0)

Call cuda.devicearray() with information from the array.

numba.cuda.api.event(timing=True)

Create a CUDA event.

numba.cuda.api.get_current_device()

Get current device associated with the current thread

numba.cuda.api.list_devices()

List all CUDA devices

numba.cuda.api.mapped(*args, **kws)

A context manager for temporarily mapping a sequence of host ndarrays.

numba.cuda.api.mapped_array(shape, dtype=np.float, strides=None, order='C', stream=0, portable=False, wc=False)

Allocate a mapped ndarray with a buffer that is pinned and mapped on to the device. Similar to numpy.empty()

Parameters:
  • portable – a boolean flag to allow the allocated device memory to be usable in multiple devices.
  • wc – a boolean flag to enable writecombined allocation which is faster to write by the host and to read by the device, but slower to write by the host and slower to write by the device.
numba.cuda.api.pinned(*args, **kws)

A context manager for temporary pinning a sequence of host ndarrays.

numba.cuda.api.pinned_array(shape, dtype=np.float, strides=None, order='C')

Allocate a numpy.ndarray with a buffer that is pinned (pagelocked). Similar to numpy.empty().

numba.cuda.api.select_device(device_id)

Creates a new CUDA context with the selected device. The context is associated with the current thread. NumbaPro currently allows only one context per thread.

Returns a device instance

Raises exception on error.

numba.cuda.api.stream()

Create a CUDA stream that represents a command queue for the device.

numba.cuda.api.synchronize()

Synchronize current context

numba.cuda.api.to_device(ary, stream=0, copy=True, to=None)

Allocate and transfer a numpy ndarray to the device.

To copy host->device a numpy array:

ary = numpy.arange(10)
d_ary = cuda.to_device(ary)

To enqueue the transfer to a stream:

stream = cuda.stream()
d_ary = cuda.to_device(ary, stream=stream)

The resulting d_ary is a DeviceNDArray.

To copy device->host:

hary = d_ary.copy_to_host()

To copy device->host to an existing array:

ary = numpy.empty(shape=d_ary.shape, dtype=d_ary.dtype)
d_ary.copy_to_host(ary)

To enqueue the transfer to a stream:

hary = d_ary.copy_to_host(stream=stream)

numba.cuda.compiler module

class numba.cuda.compiler.AutoJitCUDAKernel(func, bind, targetoptions)

Bases: numba.cuda.compiler.CUDAKernelBase

specialize(*args)
class numba.cuda.compiler.CUDAKernel(llvm_module, name, argtypes, link=(), debug=False, exceptions={})

Bases: numba.cuda.compiler.CUDAKernelBase

bind()

Force binding to current CUDA context

device

Get current active context

ptx
class numba.cuda.compiler.CUDAKernelBase

Bases: object

Define interface for configurable kernels

configure(griddim, blockdim, stream=0, sharedmem=0)
copy()
class numba.cuda.compiler.CachedCUFunction(entry_name, ptx, linking)

Bases: object

Get or compile CUDA function for the current active context

Uses device ID as key for cache.

get()
get_info()
class numba.cuda.compiler.CachedPTX(llvmir)

Bases: object

A PTX cache that uses compute capability as a cache key

get()

Get PTX for the current active context.

class numba.cuda.compiler.Complex(val)

Bases: _ctypes.Structure

class numba.cuda.compiler.Complex128(val)

Bases: numba.cuda.compiler.Complex

imag

Structure/Union member

real

Structure/Union member

class numba.cuda.compiler.Complex64(val)

Bases: numba.cuda.compiler.Complex

imag

Structure/Union member

real

Structure/Union member

class numba.cuda.compiler.DeviceFunction(cres)

Bases: object

class numba.cuda.compiler.ExternFunction(name, sig)

Bases: object

numba.cuda.compiler.compile_cuda(pyfunc, return_type, args, debug)
numba.cuda.compiler.compile_device(pyfunc, return_type, args, inline=True, debug=False)
numba.cuda.compiler.compile_kernel(pyfunc, args, link, debug=False)
numba.cuda.compiler.declare_device_function(name, restype, argtypes)

numba.cuda.cudadecl module

class numba.cuda.cudadecl.CudaAtomicTemplate(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<class 'numba.cuda.stubs.atomic'>)
resolve_add(mod)
class numba.cuda.cudadecl.CudaConstModuleTemplate(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<class 'numba.cuda.stubs.const'>)
resolve_array_like(mod)
class numba.cuda.cudadecl.CudaLocalModuleTemplate(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<class 'numba.cuda.stubs.local'>)
resolve_array(mod)
class numba.cuda.cudadecl.CudaModuleTemplate(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<module 'numba.cuda' from '/Users/stan/anaconda/envs/numba_dev/lib/python2.7/site-packages/numba/cuda/__init__.pyc'>)
resolve_atomic(mod)
resolve_blockDim(mod)
resolve_blockIdx(mod)
resolve_const(mod)
resolve_grid(mod)
resolve_gridDim(mod)
resolve_gridsize(mod)
resolve_local(mod)
resolve_shared(mod)
resolve_syncthreads(mod)
resolve_threadIdx(mod)
class numba.cuda.cudadecl.CudaSharedModuleTemplate(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<class 'numba.cuda.stubs.shared'>)
resolve_array(mod)
class numba.cuda.cudadecl.Cuda_atomic_add(context)

Bases: numba.typing.templates.AbstractTemplate

generic(args, kws)
key

alias of add

class numba.cuda.cudadecl.Cuda_blockDim(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<class 'numba.cuda.stubs.blockDim'>)
resolve_x(mod)
resolve_y(mod)
resolve_z(mod)
class numba.cuda.cudadecl.Cuda_blockDim_x

Bases: numba.typing.templates.MacroTemplate

key = <macro ntid.x -> () -> int32>
class numba.cuda.cudadecl.Cuda_blockDim_y

Bases: numba.typing.templates.MacroTemplate

key = <macro ntid.y -> () -> int32>
class numba.cuda.cudadecl.Cuda_blockDim_z

Bases: numba.typing.templates.MacroTemplate

key = <macro ntid.z -> () -> int32>
class numba.cuda.cudadecl.Cuda_blockIdx(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<class 'numba.cuda.stubs.blockIdx'>)
resolve_x(mod)
resolve_y(mod)
resolve_z(mod)
class numba.cuda.cudadecl.Cuda_blockIdx_x

Bases: numba.typing.templates.MacroTemplate

key = <macro ctaid.x -> () -> int32>
class numba.cuda.cudadecl.Cuda_blockIdx_y

Bases: numba.typing.templates.MacroTemplate

key = <macro ctaid.y -> () -> int32>
class numba.cuda.cudadecl.Cuda_blockIdx_z

Bases: numba.typing.templates.MacroTemplate

key = <macro ctaid.z -> () -> int32>
class numba.cuda.cudadecl.Cuda_const_arraylike

Bases: numba.typing.templates.MacroTemplate

key = <macro const.array_like -> <function const_array_like at 0x10837c320>>
class numba.cuda.cudadecl.Cuda_grid

Bases: numba.typing.templates.MacroTemplate

key = <macro ptx.grid -> <function grid_expand at 0x10837c140>>
class numba.cuda.cudadecl.Cuda_gridDim(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<class 'numba.cuda.stubs.gridDim'>)
resolve_x(mod)
resolve_y(mod)
resolve_z(mod)
class numba.cuda.cudadecl.Cuda_gridDim_x

Bases: numba.typing.templates.MacroTemplate

key = <macro nctaid.x -> () -> int32>
class numba.cuda.cudadecl.Cuda_gridDim_y

Bases: numba.typing.templates.MacroTemplate

key = <macro nctaid.y -> () -> int32>
class numba.cuda.cudadecl.Cuda_gridDim_z

Bases: numba.typing.templates.MacroTemplate

key = <macro nctaid.z -> () -> int32>
class numba.cuda.cudadecl.Cuda_gridsize

Bases: numba.typing.templates.MacroTemplate

key = <macro ptx.gridsize -> <function gridsize_expand at 0x10837c1b8>>
class numba.cuda.cudadecl.Cuda_local_array

Bases: numba.typing.templates.MacroTemplate

key = <macro local.array -> <function local_array at 0x10837c2a8>>
class numba.cuda.cudadecl.Cuda_shared_array

Bases: numba.typing.templates.MacroTemplate

key = <macro shared.array -> <function shared_array at 0x10837c230>>
class numba.cuda.cudadecl.Cuda_syncthreads(context)

Bases: numba.typing.templates.ConcreteTemplate

cases = [() -> none]
key

alias of syncthreads

class numba.cuda.cudadecl.Cuda_threadIdx(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<class 'numba.cuda.stubs.threadIdx'>)
resolve_x(mod)
resolve_y(mod)
resolve_z(mod)
class numba.cuda.cudadecl.Cuda_threadIdx_x

Bases: numba.typing.templates.MacroTemplate

key = <macro tid.x -> () -> int32>
class numba.cuda.cudadecl.Cuda_threadIdx_y

Bases: numba.typing.templates.MacroTemplate

key = <macro tid.y -> () -> int32>
class numba.cuda.cudadecl.Cuda_threadIdx_z

Bases: numba.typing.templates.MacroTemplate

key = <macro tid.z -> () -> int32>

numba.cuda.cudaimpl module

numba.cuda.cudaimpl.ptx_atomic_add_intp(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_atomic_add_tuple(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_atomic_add_unituple(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_cmem_arylike(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_grid1d(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_grid2d(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_gridsize1d(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_gridsize2d(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_lmem_alloc_array(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_lmem_alloc_intp(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_smem_alloc_array(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_smem_alloc_intp(context, builder, sig, args)
numba.cuda.cudaimpl.ptx_sreg_template(sreg)
numba.cuda.cudaimpl.ptx_syncthreads(context, builder, sig, args)

numba.cuda.cudamath module

class numba.cuda.cudamath.MathModuleAttribute(context)

Bases: numba.typing.templates.AttributeTemplate

key = Module(<module 'math' from '/Users/stan/anaconda/envs/numba_dev/lib/python2.7/lib-dynload/math.so'>)
resolve_acos(mod)
resolve_acosh(mod)
resolve_asin(mod)
resolve_asinh(mod)
resolve_atan(mod)
resolve_atan2(mod)
resolve_atanh(mod)
resolve_ceil(mod)
resolve_copysign(mod)
resolve_cos(mod)
resolve_cosh(mod)
resolve_degrees(mod)
resolve_e(mod)
resolve_exp(mod)
resolve_expm1(mod)
resolve_fabs(mod)
resolve_floor(mod)
resolve_fmod(mod)
resolve_isinf(mod)
resolve_isnan(mod)
resolve_log(mod)
resolve_log10(mod)
resolve_log1p(mod)
resolve_pi(mod)
resolve_pow(mod)
resolve_radians(mod)
resolve_sin(mod)
resolve_sinh(mod)
resolve_sqrt(mod)
resolve_tan(mod)
resolve_tanh(mod)
resolve_trunc(mod)
class numba.cuda.cudamath.Math_acos(context)

Bases: numba.cuda.cudamath.Math_unary

key()

acos(x)

Return the arc cosine (measured in radians) of x.

class numba.cuda.cudamath.Math_acosh(context)

Bases: numba.cuda.cudamath.Math_unary

key()

acosh(x)

Return the hyperbolic arc cosine (measured in radians) of x.

class numba.cuda.cudamath.Math_asin(context)

Bases: numba.cuda.cudamath.Math_unary

key()

asin(x)

Return the arc sine (measured in radians) of x.

class numba.cuda.cudamath.Math_asinh(context)

Bases: numba.cuda.cudamath.Math_unary

key()

asinh(x)

Return the hyperbolic arc sine (measured in radians) of x.

class numba.cuda.cudamath.Math_atan(context)

Bases: numba.cuda.cudamath.Math_unary

key()

atan(x)

Return the arc tangent (measured in radians) of x.

class numba.cuda.cudamath.Math_atan2(context)

Bases: numba.typing.templates.ConcreteTemplate

cases = [(int64, int64) -> float64, (uint64, uint64) -> float64, (float32, float32) -> float32, (float64, float64) -> float64]
key()

atan2(y, x)

Return the arc tangent (measured in radians) of y/x. Unlike atan(y/x), the signs of both x and y are considered.

class numba.cuda.cudamath.Math_atanh(context)

Bases: numba.cuda.cudamath.Math_unary

key()

atanh(x)

Return the hyperbolic arc tangent (measured in radians) of x.

class numba.cuda.cudamath.Math_binary(context)

Bases: numba.typing.templates.ConcreteTemplate

cases = [(float32, float32) -> float32, (float64, float64) -> float64]
class numba.cuda.cudamath.Math_ceil(context)

Bases: numba.cuda.cudamath.Math_unary

key()

ceil(x)

Return the ceiling of x as a float. This is the smallest integral value >= x.

class numba.cuda.cudamath.Math_copysign(context)

Bases: numba.cuda.cudamath.Math_binary

key()

copysign(x, y)

Return x with the sign of y.

class numba.cuda.cudamath.Math_cos(context)

Bases: numba.cuda.cudamath.Math_unary

key()

cos(x)

Return the cosine of x (measured in radians).

class numba.cuda.cudamath.Math_cosh(context)

Bases: numba.cuda.cudamath.Math_unary

key()

cosh(x)

Return the hyperbolic cosine of x.

class numba.cuda.cudamath.Math_degrees(context)

Bases: numba.cuda.cudamath.Math_unary

key()

degrees(x)

Convert angle x from radians to degrees.

class numba.cuda.cudamath.Math_exp(context)

Bases: numba.cuda.cudamath.Math_unary

key()

exp(x)

Return e raised to the power of x.

class numba.cuda.cudamath.Math_expm1(context)

Bases: numba.cuda.cudamath.Math_unary

key()

expm1(x)

Return exp(x)-1. This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.

class numba.cuda.cudamath.Math_fabs(context)

Bases: numba.cuda.cudamath.Math_unary

key()

fabs(x)

Return the absolute value of the float x.

class numba.cuda.cudamath.Math_floor(context)

Bases: numba.cuda.cudamath.Math_unary

key()

floor(x)

Return the floor of x as a float. This is the largest integral value <= x.

class numba.cuda.cudamath.Math_fmod(context)

Bases: numba.cuda.cudamath.Math_binary

key()

fmod(x, y)

Return fmod(x, y), according to platform C. x % y may differ.

class numba.cuda.cudamath.Math_isinf(context)

Bases: numba.typing.templates.ConcreteTemplate

cases = [(int64,) -> bool, (uint64,) -> bool, (float32,) -> bool, (float64,) -> bool]
key()

isinf(x) -> bool

Check if float x is infinite (positive or negative).

class numba.cuda.cudamath.Math_isnan(context)

Bases: numba.typing.templates.ConcreteTemplate

cases = [(int64,) -> bool, (uint64,) -> bool, (float32,) -> bool, (float64,) -> bool]
key()

isnan(x) -> bool

Check if float x is not a number (NaN).

class numba.cuda.cudamath.Math_log(context)

Bases: numba.cuda.cudamath.Math_unary

key()

log(x[, base])

Return the logarithm of x to the given base. If the base not specified, returns the natural logarithm (base e) of x.

class numba.cuda.cudamath.Math_log10(context)

Bases: numba.cuda.cudamath.Math_unary

key()

log10(x)

Return the base 10 logarithm of x.

class numba.cuda.cudamath.Math_log1p(context)

Bases: numba.cuda.cudamath.Math_unary

key()

log1p(x)

Return the natural logarithm of 1+x (base e). The result is computed in a way which is accurate for x near zero.

class numba.cuda.cudamath.Math_pow(context)

Bases: numba.typing.templates.ConcreteTemplate

cases = [(float32, float32) -> float32, (float64, float64) -> float64, (float32, int32) -> float32, (float64, int32) -> float64]
key()

pow(x, y)

Return x**y (x to the power of y).

class numba.cuda.cudamath.Math_radians(context)

Bases: numba.cuda.cudamath.Math_unary

key()

radians(x)

Convert angle x from degrees to radians.

class numba.cuda.cudamath.Math_sin(context)

Bases: numba.cuda.cudamath.Math_unary

key()

sin(x)

Return the sine of x (measured in radians).

class numba.cuda.cudamath.Math_sinh(context)

Bases: numba.cuda.cudamath.Math_unary

key()

sinh(x)

Return the hyperbolic sine of x.

class numba.cuda.cudamath.Math_sqrt(context)

Bases: numba.cuda.cudamath.Math_unary

key()

sqrt(x)

Return the square root of x.

class numba.cuda.cudamath.Math_tan(context)

Bases: numba.cuda.cudamath.Math_unary

key()

tan(x)

Return the tangent of x (measured in radians).

class numba.cuda.cudamath.Math_tanh(context)

Bases: numba.cuda.cudamath.Math_unary

key()

tanh(x)

Return the hyperbolic tangent of x.

class numba.cuda.cudamath.Math_trunc(context)

Bases: numba.cuda.cudamath.Math_unary

key()

trunc(x:Real) -> Integral

Truncates x to the nearest Integral toward 0. Uses the __trunc__ magic method.

class numba.cuda.cudamath.Math_unary(context)

Bases: numba.typing.templates.ConcreteTemplate

cases = [(int64,) -> float64, (uint64,) -> float64, (float32,) -> float32, (float64,) -> float64]

numba.cuda.decorators module

numba.cuda.decorators.autojit(func, **kws)

JIT at callsite. Function signature is not needed as this will capture the type at call time. Each signature of the kernel is cached for future use.

Note

Can only compile CUDA kernel.

Example:

import numpy

@cuda.autojit
def foo(aryA, aryB):
    ...

aryA = numpy.arange(10, dtype=np.int32)
aryB = numpy.arange(10, dtype=np.float32)
foo[griddim, blockdim](aryA, aryB)

In the above code, a version of foo with the signature “void(int32[:], float32[:])” is compiled.

numba.cuda.decorators.convert_types(restype, argtypes)
numba.cuda.decorators.declare_device(name, restype=None, argtypes=None)
numba.cuda.decorators.jit(restype=None, argtypes=None, device=False, inline=False, bind=True, link=, []debug=False, **kws)

JIT compile a python function conforming to the CUDA-Python specification.

To define a CUDA kernel that takes two int 1D-arrays:

@cuda.jit('void(int32[:], int32[:])')
def foo(aryA, aryB):
    ...

Note

A kernel cannot have any return value.

To launch the cuda kernel:

griddim = 1, 2
blockdim = 3, 4
foo[griddim, blockdim](aryA, aryB)

griddim is the number of thread-block per grid. It can be:

  • an int;
  • tuple-1 of ints;
  • tuple-2 of ints.

blockdim is the number of threads per block. It can be:

  • an int;
  • tuple-1 of ints;
  • tuple-2 of ints;
  • tuple-3 of ints.

The above code is equaivalent to the following CUDA-C.

dim3 griddim(1, 2);
dim3 blockdim(3, 4);
foo<<<griddim, blockdim>>>(aryA, aryB);

To access the compiled PTX code:

print foo.ptx

To define a CUDA device function that takes two ints and returns a int:

@cuda.jit('int32(int32, int32)', device=True)
def bar(a, b):
    ...

To force inline the device function:

@cuda.jit('int32(int32, int32)', device=True, inline=True)
def bar_forced_inline(a, b):
    ...

A device function can only be used inside another kernel. It cannot be called from the host.

Using bar in a CUDA kernel:

@cuda.jit('void(int32[:], int32[:], int32[:])')
def use_bar(aryA, aryB, aryOut):
    i = cuda.grid(1) # global position of the thread for a 1D grid.
    aryOut[i] = bar(aryA[i], aryB[i])

numba.cuda.descriptor module

class numba.cuda.descriptor.CPUTargetOptions

Bases: numba.targets.options.TargetOptions

OPTIONS = {}
class numba.cuda.descriptor.CUDATargetDesc

Bases: numba.targets.descriptors.TargetDescriptor

options

alias of CPUTargetOptions

targetctx = <numba.cuda.target.CUDATargetContext object at 0x10f8d41d0>
typingctx = <numba.cuda.target.CUDATypingContext object at 0x10c802a90>

numba.cuda.dispatcher module

class numba.cuda.dispatcher.CUDADispatcher(py_func, locals={}, targetoptions={})

Bases: object

compile(sig, locals={}, **targetoptions)
compiled
configure(*args, **kws)
disable_compile(val=True)

Disable the compilation of new signatures at call time.

targetdescr

alias of CUDATarget

class numba.cuda.dispatcher.CUDATarget

Bases: numba.targets.descriptors.TargetDescriptor

options

alias of CUDATargetOptions

class numba.cuda.dispatcher.CUDATargetOptions

Bases: numba.targets.options.TargetOptions

OPTIONS = {}

numba.cuda.errors module

exception numba.cuda.errors.KernelRuntimeError(msg, tid=None, ctaid=None)

Bases: exceptions.RuntimeError

numba.cuda.initialize module

numba.cuda.initialize.init_jit()
numba.cuda.initialize.initialize_all()

numba.cuda.libdevice module

numba.cuda.libdevice.binary_implement(nvname, ty)
numba.cuda.libdevice.bool_implement(nvname, ty)
numba.cuda.libdevice.impl32(context, builder, sig, args)
numba.cuda.libdevice.impl64(context, builder, sig, args)
numba.cuda.libdevice.powi_implement(nvname)
numba.cuda.libdevice.unary_implement(nvname, ty)

numba.cuda.nvvmutils module

class numba.cuda.nvvmutils.SRegBuilder(builder)

Bases: object

ctaid(xyz)
getdim(xyz)
nctaid(xyz)
ntid(xyz)
tid(xyz)
numba.cuda.nvvmutils.call_sreg(builder, name)
numba.cuda.nvvmutils.declare_atomic_add_float32(lmod)
numba.cuda.nvvmutils.declare_string(builder, value)
numba.cuda.nvvmutils.get_global_id(builder, dim)
numba.cuda.nvvmutils.insert_addrspace_conv(lmod, elemtype, addrspace)

numba.cuda.stubs module

This scripts specifies all PTX special objects.

class numba.cuda.stubs.Stub

Bases: object

A stub object to represent special objects which is meaningless outside the context of CUDA-python.

class numba.cuda.stubs.atomic

Bases: numba.cuda.stubs.Stub

atomic namespace

class add

Bases: numba.cuda.stubs.Stub

add(ary, idx, val)

Perform atomic ary[idx] += val

class numba.cuda.stubs.blockDim

Bases: numba.cuda.stubs.Stub

blockDim.{x, y, z}

x = <macro ntid.x -> () -> int32>
y = <macro ntid.y -> () -> int32>
z = <macro ntid.z -> () -> int32>
class numba.cuda.stubs.blockIdx

Bases: numba.cuda.stubs.Stub

blockIdx.{x, y}

x = <macro ctaid.x -> () -> int32>
y = <macro ctaid.y -> () -> int32>
z = <macro ctaid.z -> () -> int32>
class numba.cuda.stubs.const

Bases: numba.cuda.stubs.Stub

shared namespace

array_like = <macro const.array_like -> <function const_array_like at 0x10837c320>>
numba.cuda.stubs.const_array_like(ndarray)
class numba.cuda.stubs.gridDim

Bases: numba.cuda.stubs.Stub

gridDim.{x, y}

x = <macro nctaid.x -> () -> int32>
y = <macro nctaid.y -> () -> int32>
z = <macro nctaid.z -> () -> int32>
numba.cuda.stubs.grid_expand(ndim)

grid(ndim)

ndim: [int] 1 or 2

if ndim == 1:
return cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
elif ndim == 2:
x = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x y = cuda.threadIdx.y + cuda.blockIdx.y * cuda.blockDim.y return x, y
numba.cuda.stubs.gridsize_expand(ndim)

gridsize(ndim)

ndim: [int] 1 or 2

if ndim == 1:
return cuda.blockDim.x * cuda.gridDim.x
elif ndim == 2:
x = cuda.blockDim.x * cuda.gridDim.x y = cuda.blockDim.y * cuda.gridDim.y return x, y
class numba.cuda.stubs.local

Bases: numba.cuda.stubs.Stub

shared namespace

array = <macro local.array -> <function local_array at 0x10837c2a8>>
numba.cuda.stubs.local_array(shape, dtype)
class numba.cuda.stubs.shared

Bases: numba.cuda.stubs.Stub

shared namespace

array = <macro shared.array -> <function shared_array at 0x10837c230>>
numba.cuda.stubs.shared_array(shape, dtype)
class numba.cuda.stubs.syncthreads

Bases: numba.cuda.stubs.Stub

syncthreads()

Synchronizes all threads in the thread block.

class numba.cuda.stubs.threadIdx

Bases: numba.cuda.stubs.Stub

threadIdx.{x, y, z}

x = <macro tid.x -> () -> int32>
y = <macro tid.y -> () -> int32>
z = <macro tid.z -> () -> int32>

numba.cuda.target module

class numba.cuda.target.CUDATargetContext(typing_context)

Bases: numba.targets.base.BaseContext

generate_kernel_wrapper(func, argtypes)
get_abi_sizeof(lty)
implement_powi_as_math_call = True
init()
insert_string_const_addrspace(builder, string)

Insert a constant string in the constant addresspace and return a generic i8 pointer to the data.

This function attempts to deduplicate.

make_constant_array(builder, typ, ary)

Return dummy value.

XXX: We should be able to move cuda.const.array_like into here.

mangler(name, argtypes)
optimize_function(func)

Run O1 function passes

prepare_cuda_kernel(func, argtypes)
strict_alignment = True
class numba.cuda.target.CUDATypingContext

Bases: numba.typing.context.BaseContext

init()

numba.cuda.testing module

class numba.cuda.testing.CUDATestCase(methodName='runTest')

Bases: unittest.case.TestCase

tearDown()

Module contents

numba.cuda.cuda_error()

Returns None or an exception if the CUDA driver fails to initialize.

numba.cuda.is_available()

Returns a boolean to indicate the availability of a CUDA GPU.

This will initialize the driver if it hasn’t been initialized.

numba.cuda.test()