numba.cuda package¶
Subpackages¶
- numba.cuda.cudadrv package
- Submodules
- numba.cuda.cudadrv.devicearray module
- numba.cuda.cudadrv.devices module
- numba.cuda.cudadrv.driver module
- numba.cuda.cudadrv.drvapi module
- numba.cuda.cudadrv.enums module
- numba.cuda.cudadrv.error module
- numba.cuda.cudadrv.libs module
- numba.cuda.cudadrv.ndarray module
- numba.cuda.cudadrv.nvvm module
- Module contents
Submodules¶
numba.cuda.api module¶
API that are reported to numba.cuda
- numba.cuda.api.close()¶
Explicitly closes the context.
Destroy the current context of the current thread
- numba.cuda.api.defer_cleanup(*args, **kwds)¶
- numba.cuda.api.detect()¶
Detect hardware support
- numba.cuda.api.device_array(shape, dtype=np.float, strides=None, order='C', stream=0)¶
Allocate an empty device ndarray. Similar to numpy.empty()
- numba.cuda.api.device_array_like(ary, stream=0)¶
Call cuda.devicearray() with information from the array.
- numba.cuda.api.event(timing=True)¶
Create a CUDA event.
- numba.cuda.api.get_current_device()¶
Get current device associated with the current thread
- numba.cuda.api.list_devices()¶
List all CUDA devices
- numba.cuda.api.mapped(*args, **kws)¶
A context manager for temporarily mapping a sequence of host ndarrays.
- numba.cuda.api.mapped_array(shape, dtype=np.float, strides=None, order='C', stream=0, portable=False, wc=False)¶
Allocate a mapped ndarray with a buffer that is pinned and mapped on to the device. Similar to numpy.empty()
Parameters: - portable – a boolean flag to allow the allocated device memory to be usable in multiple devices.
- wc – a boolean flag to enable writecombined allocation which is faster to write by the host and to read by the device, but slower to write by the host and slower to write by the device.
- numba.cuda.api.pinned(*args, **kws)¶
A context manager for temporary pinning a sequence of host ndarrays.
- numba.cuda.api.pinned_array(shape, dtype=np.float, strides=None, order='C')¶
Allocate a numpy.ndarray with a buffer that is pinned (pagelocked). Similar to numpy.empty().
- numba.cuda.api.select_device(device_id)¶
Creates a new CUDA context with the selected device. The context is associated with the current thread. NumbaPro currently allows only one context per thread.
Returns a device instance
Raises exception on error.
- numba.cuda.api.stream()¶
Create a CUDA stream that represents a command queue for the device.
- numba.cuda.api.synchronize()¶
Synchronize current context
- numba.cuda.api.to_device(ary, stream=0, copy=True, to=None)¶
Allocate and transfer a numpy ndarray to the device.
To copy host->device a numpy array:
ary = numpy.arange(10) d_ary = cuda.to_device(ary)
To enqueue the transfer to a stream:
stream = cuda.stream() d_ary = cuda.to_device(ary, stream=stream)
The resulting d_ary is a DeviceNDArray.
To copy device->host:
hary = d_ary.copy_to_host()
To copy device->host to an existing array:
ary = numpy.empty(shape=d_ary.shape, dtype=d_ary.dtype) d_ary.copy_to_host(ary)
To enqueue the transfer to a stream:
hary = d_ary.copy_to_host(stream=stream)
numba.cuda.codegen module¶
- class numba.cuda.codegen.CUDACodeLibrary(codegen, name)¶
- class numba.cuda.codegen.JITCUDACodegen(module_name)¶
Bases: numba.targets.codegen.BaseCPUCodegen
This codegen implementation for CUDA actually only generates optimized LLVM IR. Generation of PTX code is done separately (see numba.cuda.compiler).
numba.cuda.compiler module¶
- class numba.cuda.compiler.AutoJitCUDAKernel(func, bind, targetoptions)¶
Bases: numba.cuda.compiler.CUDAKernelBase
- specialize(*args)¶
- class numba.cuda.compiler.CUDAKernel(llvm_module, name, argtypes, link=(), debug=False, exceptions={})¶
Bases: numba.cuda.compiler.CUDAKernelBase
- bind()¶
Force binding to current CUDA context
- device¶
Get current active context
- ptx¶
- class numba.cuda.compiler.CUDAKernelBase¶
Bases: object
Define interface for configurable kernels
- configure(griddim, blockdim, stream=0, sharedmem=0)¶
- copy()¶
- class numba.cuda.compiler.CachedCUFunction(entry_name, ptx, linking)¶
Bases: object
Get or compile CUDA function for the current active context
Uses device ID as key for cache.
- get()¶
- get_info()¶
- class numba.cuda.compiler.CachedPTX(llvmir)¶
Bases: object
A PTX cache that uses compute capability as a cache key
- get()¶
Get PTX for the current active context.
- class numba.cuda.compiler.Complex(val)¶
Bases: _ctypes.Structure
- class numba.cuda.compiler.Complex128(val)¶
Bases: numba.cuda.compiler.Complex
- imag¶
Structure/Union member
- real¶
Structure/Union member
- class numba.cuda.compiler.Complex64(val)¶
Bases: numba.cuda.compiler.Complex
- imag¶
Structure/Union member
- real¶
Structure/Union member
- class numba.cuda.compiler.DeviceFunction(cres)¶
Bases: object
- class numba.cuda.compiler.ExternFunction(name, sig)¶
Bases: object
- numba.cuda.compiler.compile_cuda(pyfunc, return_type, args, debug)¶
- numba.cuda.compiler.compile_device(pyfunc, return_type, args, inline=True, debug=False)¶
- numba.cuda.compiler.compile_kernel(pyfunc, args, link, debug=False)¶
- numba.cuda.compiler.declare_device_function(name, restype, argtypes)¶
numba.cuda.cudadecl module¶
- class numba.cuda.cudadecl.CudaAtomicTemplate(context)¶
Bases: numba.typing.templates.AttributeTemplate
- key = Module(<class 'numba.cuda.stubs.atomic'>)¶
- resolve_add(mod)¶
- class numba.cuda.cudadecl.CudaConstModuleTemplate(context)¶
Bases: numba.typing.templates.AttributeTemplate
- key = Module(<class 'numba.cuda.stubs.const'>)¶
- resolve_array_like(mod)¶
- class numba.cuda.cudadecl.CudaLocalModuleTemplate(context)¶
Bases: numba.typing.templates.AttributeTemplate
- key = Module(<class 'numba.cuda.stubs.local'>)¶
- resolve_array(mod)¶
- class numba.cuda.cudadecl.CudaModuleTemplate(context)¶
Bases: numba.typing.templates.AttributeTemplate
- key = Module(<module 'numba.cuda' from '/Users/stan/anaconda/envs/numba_docgen/lib/python2.7/site-packages/numba-0.16.0-py2.7-macosx-10.5-x86_64.egg/numba/cuda/__init__.pyc'>)¶
- resolve_atomic(mod)¶
- resolve_blockDim(mod)¶
- resolve_blockIdx(mod)¶
- resolve_const(mod)¶
- resolve_grid(mod)¶
- resolve_gridDim(mod)¶
- resolve_gridsize(mod)¶
- resolve_local(mod)¶
- resolve_syncthreads(mod)¶
- resolve_threadIdx(mod)¶
- class numba.cuda.cudadecl.Cuda_atomic_add(context)¶
Bases: numba.typing.templates.AbstractTemplate
- generic(args, kws)¶
- key¶
alias of add
- class numba.cuda.cudadecl.Cuda_blockDim(context)¶
Bases: numba.typing.templates.AttributeTemplate
- key = Module(<class 'numba.cuda.stubs.blockDim'>)¶
- resolve_x(mod)¶
- resolve_y(mod)¶
- resolve_z(mod)¶
- class numba.cuda.cudadecl.Cuda_blockDim_x¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro ntid.x -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_blockDim_y¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro ntid.y -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_blockDim_z¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro ntid.z -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_blockIdx(context)¶
Bases: numba.typing.templates.AttributeTemplate
- key = Module(<class 'numba.cuda.stubs.blockIdx'>)¶
- resolve_x(mod)¶
- resolve_y(mod)¶
- resolve_z(mod)¶
- class numba.cuda.cudadecl.Cuda_blockIdx_x¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro ctaid.x -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_blockIdx_y¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro ctaid.y -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_blockIdx_z¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro ctaid.z -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_const_arraylike¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro const.array_like -> <function const_array_like at 0x105bc8758>>¶
- class numba.cuda.cudadecl.Cuda_grid¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro ptx.grid -> <function grid_expand at 0x105bc8500>>¶
- class numba.cuda.cudadecl.Cuda_gridDim(context)¶
Bases: numba.typing.templates.AttributeTemplate
- key = Module(<class 'numba.cuda.stubs.gridDim'>)¶
- resolve_x(mod)¶
- resolve_y(mod)¶
- resolve_z(mod)¶
- class numba.cuda.cudadecl.Cuda_gridDim_x¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro nctaid.x -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_gridDim_y¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro nctaid.y -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_gridDim_z¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro nctaid.z -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_gridsize¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro ptx.gridsize -> <function gridsize_expand at 0x105bc8578>>¶
- class numba.cuda.cudadecl.Cuda_local_array¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro local.array -> <function local_array at 0x105bc86e0>>¶
- class numba.cuda.cudadecl.Cuda_syncthreads(context)¶
Bases: numba.typing.templates.ConcreteTemplate
- cases = [() -> none]¶
- key¶
alias of syncthreads
- class numba.cuda.cudadecl.Cuda_threadIdx(context)¶
Bases: numba.typing.templates.AttributeTemplate
- key = Module(<class 'numba.cuda.stubs.threadIdx'>)¶
- resolve_x(mod)¶
- resolve_y(mod)¶
- resolve_z(mod)¶
- class numba.cuda.cudadecl.Cuda_threadIdx_x¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro tid.x -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_threadIdx_y¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro tid.y -> () -> int32>¶
- class numba.cuda.cudadecl.Cuda_threadIdx_z¶
Bases: numba.typing.templates.MacroTemplate
- key = <macro tid.z -> () -> int32>¶
numba.cuda.cudaimpl module¶
- numba.cuda.cudaimpl.ptx_atomic_add_intp(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_atomic_add_tuple(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_atomic_add_unituple(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_cmem_arylike(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_grid1d(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_grid2d(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_gridsize1d(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_gridsize2d(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_lmem_alloc_array(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_lmem_alloc_intp(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_smem_alloc_array(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_smem_alloc_intp(context, builder, sig, args)¶
- numba.cuda.cudaimpl.ptx_sreg_template(sreg)¶
- numba.cuda.cudaimpl.ptx_syncthreads(context, builder, sig, args)¶
numba.cuda.cudamath module¶
- class numba.cuda.cudamath.MathModuleAttribute(context)¶
Bases: numba.typing.templates.AttributeTemplate
- key = Module(<module 'math' from '/Users/stan/anaconda/envs/numba_docgen/lib/python2.7/lib-dynload/math.so'>)¶
- resolve_acos(mod)¶
- resolve_acosh(mod)¶
- resolve_asin(mod)¶
- resolve_asinh(mod)¶
- resolve_atan(mod)¶
- resolve_atan2(mod)¶
- resolve_atanh(mod)¶
- resolve_ceil(mod)¶
- resolve_copysign(mod)¶
- resolve_cos(mod)¶
- resolve_cosh(mod)¶
- resolve_degrees(mod)¶
- resolve_e(mod)¶
- resolve_exp(mod)¶
- resolve_expm1(mod)¶
- resolve_fabs(mod)¶
- resolve_floor(mod)¶
- resolve_fmod(mod)¶
- resolve_isinf(mod)¶
- resolve_isnan(mod)¶
- resolve_log(mod)¶
- resolve_log10(mod)¶
- resolve_log1p(mod)¶
- resolve_pi(mod)¶
- resolve_pow(mod)¶
- resolve_radians(mod)¶
- resolve_sin(mod)¶
- resolve_sinh(mod)¶
- resolve_sqrt(mod)¶
- resolve_tan(mod)¶
- resolve_tanh(mod)¶
- resolve_trunc(mod)¶
- class numba.cuda.cudamath.Math_acos(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
acos(x)
Return the arc cosine (measured in radians) of x.
- class numba.cuda.cudamath.Math_acosh(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
acosh(x)
Return the hyperbolic arc cosine (measured in radians) of x.
- class numba.cuda.cudamath.Math_asin(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
asin(x)
Return the arc sine (measured in radians) of x.
- class numba.cuda.cudamath.Math_asinh(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
asinh(x)
Return the hyperbolic arc sine (measured in radians) of x.
- class numba.cuda.cudamath.Math_atan(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
atan(x)
Return the arc tangent (measured in radians) of x.
- class numba.cuda.cudamath.Math_atan2(context)¶
Bases: numba.typing.templates.ConcreteTemplate
- cases = [(int64, int64) -> float64, (uint64, uint64) -> float64, (float32, float32) -> float32, (float64, float64) -> float64]¶
- key()¶
atan2(y, x)
Return the arc tangent (measured in radians) of y/x. Unlike atan(y/x), the signs of both x and y are considered.
- class numba.cuda.cudamath.Math_atanh(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
atanh(x)
Return the hyperbolic arc tangent (measured in radians) of x.
- class numba.cuda.cudamath.Math_binary(context)¶
Bases: numba.typing.templates.ConcreteTemplate
- cases = [(float32, float32) -> float32, (float64, float64) -> float64]¶
- class numba.cuda.cudamath.Math_ceil(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
ceil(x)
Return the ceiling of x as a float. This is the smallest integral value >= x.
- class numba.cuda.cudamath.Math_copysign(context)¶
Bases: numba.cuda.cudamath.Math_binary
- key()¶
copysign(x, y)
Return x with the sign of y.
- class numba.cuda.cudamath.Math_cos(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
cos(x)
Return the cosine of x (measured in radians).
- class numba.cuda.cudamath.Math_cosh(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
cosh(x)
Return the hyperbolic cosine of x.
- class numba.cuda.cudamath.Math_degrees(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
degrees(x)
Convert angle x from radians to degrees.
- class numba.cuda.cudamath.Math_exp(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
exp(x)
Return e raised to the power of x.
- class numba.cuda.cudamath.Math_expm1(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
expm1(x)
Return exp(x)-1. This function avoids the loss of precision involved in the direct evaluation of exp(x)-1 for small x.
- class numba.cuda.cudamath.Math_fabs(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
fabs(x)
Return the absolute value of the float x.
- class numba.cuda.cudamath.Math_floor(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
floor(x)
Return the floor of x as a float. This is the largest integral value <= x.
- class numba.cuda.cudamath.Math_fmod(context)¶
Bases: numba.cuda.cudamath.Math_binary
- key()¶
fmod(x, y)
Return fmod(x, y), according to platform C. x % y may differ.
- class numba.cuda.cudamath.Math_isinf(context)¶
Bases: numba.typing.templates.ConcreteTemplate
- cases = [(int64,) -> bool, (uint64,) -> bool, (float32,) -> bool, (float64,) -> bool]¶
- key()¶
isinf(x) -> bool
Check if float x is infinite (positive or negative).
- class numba.cuda.cudamath.Math_isnan(context)¶
Bases: numba.typing.templates.ConcreteTemplate
- cases = [(int64,) -> bool, (uint64,) -> bool, (float32,) -> bool, (float64,) -> bool]¶
- key()¶
isnan(x) -> bool
Check if float x is not a number (NaN).
- class numba.cuda.cudamath.Math_log(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
log(x[, base])
Return the logarithm of x to the given base. If the base not specified, returns the natural logarithm (base e) of x.
- class numba.cuda.cudamath.Math_log10(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
log10(x)
Return the base 10 logarithm of x.
- class numba.cuda.cudamath.Math_log1p(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
log1p(x)
Return the natural logarithm of 1+x (base e). The result is computed in a way which is accurate for x near zero.
- class numba.cuda.cudamath.Math_pow(context)¶
Bases: numba.typing.templates.ConcreteTemplate
- cases = [(float32, float32) -> float32, (float64, float64) -> float64, (float32, int32) -> float32, (float64, int32) -> float64]¶
- key()¶
pow(x, y)
Return x**y (x to the power of y).
- class numba.cuda.cudamath.Math_radians(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
radians(x)
Convert angle x from degrees to radians.
- class numba.cuda.cudamath.Math_sin(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
sin(x)
Return the sine of x (measured in radians).
- class numba.cuda.cudamath.Math_sinh(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
sinh(x)
Return the hyperbolic sine of x.
- class numba.cuda.cudamath.Math_sqrt(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
sqrt(x)
Return the square root of x.
- class numba.cuda.cudamath.Math_tan(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
tan(x)
Return the tangent of x (measured in radians).
- class numba.cuda.cudamath.Math_tanh(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
tanh(x)
Return the hyperbolic tangent of x.
- class numba.cuda.cudamath.Math_trunc(context)¶
Bases: numba.cuda.cudamath.Math_unary
- key()¶
trunc(x:Real) -> Integral
Truncates x to the nearest Integral toward 0. Uses the __trunc__ magic method.
- class numba.cuda.cudamath.Math_unary(context)¶
Bases: numba.typing.templates.ConcreteTemplate
- cases = [(int64,) -> float64, (uint64,) -> float64, (float32,) -> float32, (float64,) -> float64]¶
numba.cuda.decorators module¶
- numba.cuda.decorators.autojit(func, **kws)¶
JIT at callsite. Function signature is not needed as this will capture the type at call time. Each signature of the kernel is cached for future use.
Note
Can only compile CUDA kernel.
Example:
import numpy @cuda.autojit def foo(aryA, aryB): ... aryA = numpy.arange(10, dtype=np.int32) aryB = numpy.arange(10, dtype=np.float32) foo[griddim, blockdim](aryA, aryB)
In the above code, a version of foo with the signature “void(int32[:], float32[:])” is compiled.
- numba.cuda.decorators.convert_types(restype, argtypes)¶
- numba.cuda.decorators.declare_device(name, restype=None, argtypes=None)¶
- numba.cuda.decorators.jit(restype=None, argtypes=None, device=False, inline=False, bind=True, link=[], debug=False, **kws)¶
JIT compile a python function conforming to the CUDA-Python specification.
To define a CUDA kernel that takes two int 1D-arrays:
@cuda.jit('void(int32[:], int32[:])') def foo(aryA, aryB): ...
Note
A kernel cannot have any return value.
To launch the cuda kernel:
griddim = 1, 2 blockdim = 3, 4 foo[griddim, blockdim](aryA, aryB)
griddim is the number of thread-block per grid. It can be:
- an int;
- tuple-1 of ints;
- tuple-2 of ints.
blockdim is the number of threads per block. It can be:
- an int;
- tuple-1 of ints;
- tuple-2 of ints;
- tuple-3 of ints.
The above code is equaivalent to the following CUDA-C.
dim3 griddim(1, 2); dim3 blockdim(3, 4); foo<<<griddim, blockdim>>>(aryA, aryB);
To access the compiled PTX code:
print foo.ptx
To define a CUDA device function that takes two ints and returns a int:
@cuda.jit('int32(int32, int32)', device=True) def bar(a, b): ...
To force inline the device function:
@cuda.jit('int32(int32, int32)', device=True, inline=True) def bar_forced_inline(a, b): ...
A device function can only be used inside another kernel. It cannot be called from the host.
Using bar in a CUDA kernel:
@cuda.jit('void(int32[:], int32[:], int32[:])') def use_bar(aryA, aryB, aryOut): i = cuda.grid(1) # global position of the thread for a 1D grid. aryOut[i] = bar(aryA[i], aryB[i])
numba.cuda.descriptor module¶
- class numba.cuda.descriptor.CPUTargetOptions¶
Bases: numba.targets.options.TargetOptions
- OPTIONS = {}¶
- class numba.cuda.descriptor.CUDATargetDesc¶
Bases: numba.targets.descriptors.TargetDescriptor
- options¶
alias of CPUTargetOptions
- targetctx = <numba.cuda.target.CUDATargetContext object at 0x10dd17b50>¶
- typingctx = <numba.cuda.target.CUDATypingContext object at 0x10dd17b90>¶
numba.cuda.dispatcher module¶
- class numba.cuda.dispatcher.CUDADispatcher(py_func, locals={}, targetoptions={})¶
Bases: object
- compile(sig, locals={}, **targetoptions)¶
- compiled¶
- configure(*args, **kws)¶
- disable_compile(val=True)¶
Disable the compilation of new signatures at call time.
- targetdescr¶
alias of CUDATarget
- class numba.cuda.dispatcher.CUDATarget¶
Bases: numba.targets.descriptors.TargetDescriptor
- options¶
alias of CUDATargetOptions
- class numba.cuda.dispatcher.CUDATargetOptions¶
Bases: numba.targets.options.TargetOptions
- OPTIONS = {}¶
numba.cuda.errors module¶
- exception numba.cuda.errors.KernelRuntimeError(msg, tid=None, ctaid=None)¶
Bases: exceptions.RuntimeError
numba.cuda.initialize module¶
- numba.cuda.initialize.init_jit()¶
- numba.cuda.initialize.initialize_all()¶
numba.cuda.libdevice module¶
- numba.cuda.libdevice.binary_implement(nvname, ty)¶
- numba.cuda.libdevice.bool_implement(nvname, ty)¶
- numba.cuda.libdevice.impl32(context, builder, sig, args)¶
- numba.cuda.libdevice.impl64(context, builder, sig, args)¶
- numba.cuda.libdevice.powi_implement(nvname)¶
- numba.cuda.libdevice.unary_implement(nvname, ty)¶
numba.cuda.nvvmutils module¶
- class numba.cuda.nvvmutils.SRegBuilder(builder)¶
Bases: object
- ctaid(xyz)¶
- getdim(xyz)¶
- nctaid(xyz)¶
- ntid(xyz)¶
- tid(xyz)¶
- numba.cuda.nvvmutils.call_sreg(builder, name)¶
- numba.cuda.nvvmutils.declare_atomic_add_float32(lmod)¶
- numba.cuda.nvvmutils.declare_string(builder, value)¶
- numba.cuda.nvvmutils.get_global_id(builder, dim)¶
- numba.cuda.nvvmutils.insert_addrspace_conv(lmod, elemtype, addrspace)¶
numba.cuda.stubs module¶
This scripts specifies all PTX special objects.
- class numba.cuda.stubs.Stub¶
Bases: object
A stub object to represent special objects which is meaningless outside the context of CUDA-python.
- class numba.cuda.stubs.atomic¶
Bases: numba.cuda.stubs.Stub
atomic namespace
- class add¶
Bases: numba.cuda.stubs.Stub
add(ary, idx, val)
Perform atomic ary[idx] += val
- class numba.cuda.stubs.blockDim¶
Bases: numba.cuda.stubs.Stub
blockDim.{x, y, z}
- x = <macro ntid.x -> () -> int32>¶
- y = <macro ntid.y -> () -> int32>¶
- z = <macro ntid.z -> () -> int32>¶
- class numba.cuda.stubs.blockIdx¶
Bases: numba.cuda.stubs.Stub
blockIdx.{x, y}
- x = <macro ctaid.x -> () -> int32>¶
- y = <macro ctaid.y -> () -> int32>¶
- z = <macro ctaid.z -> () -> int32>¶
- class numba.cuda.stubs.const¶
Bases: numba.cuda.stubs.Stub
shared namespace
- array_like = <macro const.array_like -> <function const_array_like at 0x105bc8758>>¶
- numba.cuda.stubs.const_array_like(ndarray)¶
- class numba.cuda.stubs.gridDim¶
Bases: numba.cuda.stubs.Stub
gridDim.{x, y}
- x = <macro nctaid.x -> () -> int32>¶
- y = <macro nctaid.y -> () -> int32>¶
- z = <macro nctaid.z -> () -> int32>¶
- numba.cuda.stubs.grid_expand(ndim)¶
grid(ndim)
ndim: [int] 1 or 2
- if ndim == 1:
- return cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
- elif ndim == 2:
- x = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x y = cuda.threadIdx.y + cuda.blockIdx.y * cuda.blockDim.y return x, y
- numba.cuda.stubs.gridsize_expand(ndim)¶
gridsize(ndim)
ndim: [int] 1 or 2
- if ndim == 1:
- return cuda.blockDim.x * cuda.gridDim.x
- elif ndim == 2:
- x = cuda.blockDim.x * cuda.gridDim.x y = cuda.blockDim.y * cuda.gridDim.y return x, y
- class numba.cuda.stubs.local¶
Bases: numba.cuda.stubs.Stub
shared namespace
- array = <macro local.array -> <function local_array at 0x105bc86e0>>¶
- numba.cuda.stubs.local_array(shape, dtype)¶
Bases: numba.cuda.stubs.Stub
shared namespace
- class numba.cuda.stubs.syncthreads¶
Bases: numba.cuda.stubs.Stub
syncthreads()
Synchronizes all threads in the thread block.
- class numba.cuda.stubs.threadIdx¶
Bases: numba.cuda.stubs.Stub
threadIdx.{x, y, z}
- x = <macro tid.x -> () -> int32>¶
- y = <macro tid.y -> () -> int32>¶
- z = <macro tid.z -> () -> int32>¶
numba.cuda.target module¶
- class numba.cuda.target.CUDATargetContext(typing_context)¶
Bases: numba.targets.base.BaseContext
- generate_kernel_wrapper(func, argtypes)¶
- implement_powi_as_math_call = True¶
- init()¶
- insert_string_const_addrspace(builder, string)¶
Insert a constant string in the constant addresspace and return a generic i8 pointer to the data.
This function attempts to deduplicate.
- jit_codegen()¶
- make_constant_array(builder, typ, ary)¶
Return dummy value.
XXX: We should be able to move cuda.const.array_like into here.
- mangler(name, argtypes)¶
- optimize_function(func)¶
Run O1 function passes
- prepare_cuda_kernel(func, argtypes)¶
- strict_alignment = True¶
- target_data¶
- class numba.cuda.target.CUDATypingContext¶
Bases: numba.typing.context.BaseContext
- init()¶