numba.cuda.cudadrv package¶

Submodules¶

numba.cuda.cudadrv.devicearray module¶

A CUDA ND Array is recognized by checking the __cuda_memory__ attribute on the object. If it exists and evaluate to True, it must define shape, strides, dtype and size attributes similar to a NumPy ndarray.

class numba.cuda.cudadrv.devicearray.DeviceNDArray(shape, strides, dtype, stream=0, writeback=None, gpu_head=None, gpu_data=None)¶

Bases: numba.cuda.cudadrv.devicearray.DeviceNDArrayBase

getitem(item, stream=0)¶: Do __getitem__(item) with CUDA stream

is_c_contiguous()¶

is_f_contiguous()¶

ravel(order='C', stream=0)¶

reshape(*newshape, **kws)¶

reshape(self, *newshape, order=’C’):

Reshape the array and keeping the original data

class numba.cuda.cudadrv.devicearray.DeviceNDArrayBase(shape, strides, dtype, stream=0, writeback=None, gpu_head=None, gpu_data=None)¶

Bases: object

A on GPU NDArray representation

as_cuda_arg()¶: Returns a device memory object that is used as the argument.

bind(stream=0)¶: Bind a CUDA stream to this object so that all subsequent operation on this array defaults to the given stream.

copy_to_device(ary, stream=0)¶

Copy ary to self.

If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.

copy_to_host(ary=None, stream=0)¶

Copy self to ary or create a new numpy ndarray if ary is None.

Always returns the host array.

device_ctypes_pointer¶: Returns the ctypes pointer to the GPU data buffer

split(section, stream=0)¶: Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.

to_host(stream=0)¶

class numba.cuda.cudadrv.devicearray.MappedNDArray(shape, strides, dtype, stream=0, writeback=None, gpu_head=None, gpu_data=None)¶

Bases: numba.cuda.cudadrv.devicearray.DeviceNDArrayBase, numpy.ndarray

A host array that uses CUDA mapped memory.

device_setup(gpu_data, stream=0)¶

numba.cuda.cudadrv.devicearray.auto_device(ary, stream=0, copy=True)¶

numba.cuda.cudadrv.devicearray.from_array_like(ary, stream=0, gpu_head=None, gpu_data=None)¶: Create a DeviceNDArray object that is like ary.

numba.cuda.cudadrv.devicearray.is_cuda_ndarray(obj)¶: Check if an object is a CUDA ndarray

numba.cuda.cudadrv.devicearray.require_cuda_ndarray(obj)¶: Raises ValueError is is_cuda_ndarray(obj) evaluates False

numba.cuda.cudadrv.devicearray.sentry_contiguous(ary)¶

numba.cuda.cudadrv.devicearray.verify_cuda_ndarray_interface(obj)¶: Verify the CUDA ndarray interface for an obj

numba.cuda.cudadrv.devices module¶

Expose each GPU devices directly

class numba.cuda.cudadrv.devices.GPU(gpu)¶

Bases: object

Proxy into driver.Device

context¶

pop()¶

push()¶

reset()¶

numba.cuda.cudadrv.devices.get_context(devnum=0)¶: Get the current device or use a device by device number, and return the CUDA context.

numba.cuda.cudadrv.devices.get_gpu(i)¶

numba.cuda.cudadrv.devices.require_context(fn)¶: A decorator to ensure a context for the CUDA subsystem

numba.cuda.cudadrv.devices.reset()¶

numba.cuda.cudadrv.driver module¶

CUDA driver bridge implementation

NOTE: The new driver implementation uses a “trashing service” that help prevents a crashing the system (particularly OSX) when the CUDA context is corrupted at resource deallocation. The old approach ties resource management directly into the object destructor; thus, at corruption of the CUDA context, subsequent deallocation could further corrupt the CUDA context and causes the system to freeze in some cases.

class numba.cuda.cudadrv.driver.Context(device, handle, finalizer=None)¶

Bases: object

This object is tied to the lifetime of the actual context resource.

This object is usually wrapped in a weakref proxy for user. User seldom owns this object.

create_event(timing=True)¶

create_module_image(image)¶

create_module_ptx(ptx)¶

create_stream()¶

get_memory_info()¶: Returns (free, total) memory in bytes in the context.

memalloc(bytesize)¶

memfree(pointer)¶

memhostalloc(bytesize, mapped=False, portable=False, wc=False)¶

mempin(owner, pointer, size, mapped=False)¶

memunpin(pointer)¶

pop()¶: Pop context

push()¶: Push context

reset()¶: Clean up all owned resources in this context

synchronize()¶

unload_module(module)¶

exception numba.cuda.cudadrv.driver.CudaAPIError(code, msg)¶: Bases: numba.cuda.cudadrv.error.CudaDriverError

exception numba.cuda.cudadrv.driver.DeadMemoryError¶: Bases: exceptions.RuntimeError

class numba.cuda.cudadrv.driver.Device(devnum)¶

Bases: object

The device object owns the CUDA contexts. This is owned by the driver object. User should not construct devices directly.

COMPUTE_CAPABILITY¶: For backward compatibility

close_all_context()¶

create_context()¶

get_context()¶

get_or_create_context()¶

reset()¶

class numba.cuda.cudadrv.driver.Driver¶

Bases: object

Driver API functions are lazily bound.

get_device(devnum=0)¶

get_device_count()¶

initialize()¶

is_available¶

list_devices()¶: Returns a list of active devices

reset()¶: Reset all devices

class numba.cuda.cudadrv.driver.Event(context, handle, finalizer=None)¶

Bases: object

elapsed_time(evtend)¶

query()¶: Returns True if all work before the most recent record has completed; otherwise, returns False.

record(stream=0)¶: Set the record state of the event at the stream.

synchronize()¶: Synchronize the host thread for the completion of the event.

wait(stream=0)¶: All future works submitted to stream will wait util the event completes.

class numba.cuda.cudadrv.driver.FuncAttr¶

Bases: tuple

FuncAttr(regs, shared, local, const, maxthreads)

const¶: Alias for field number 3

local¶: Alias for field number 2

maxthreads¶: Alias for field number 4

regs¶: Alias for field number 0

shared¶: Alias for field number 1

class numba.cuda.cudadrv.driver.Function(module, handle, name)¶

Bases: object

blockdim = (1, 1, 1)¶

cache_config(prefer_equal=False, prefer_cache=False, prefer_shared=False)¶

configure(griddim, blockdim, sharedmem=0, stream=0)¶

device¶

griddim = (1, 1, 1)¶

sharedmem = 0¶

stream = 0¶

class numba.cuda.cudadrv.driver.Linker¶

Bases: object

add_file(path, kind)¶

add_file_guess_ext(path)¶

add_ptx(ptx, name='<cudapy-ptx>')¶

complete()¶

Returns (cubin, size): cubin is a pointer to a internal buffer of cubin owned by the linker; thus, it should be loaded before the linker is destroyed.

error_log¶

info_log¶

exception numba.cuda.cudadrv.driver.LinkerError¶: Bases: exceptions.RuntimeError

class numba.cuda.cudadrv.driver.MappedMemory(context, owner, hostpointer, size, finalizer=None)¶

Bases: numba.cuda.cudadrv.driver.MemoryPointer

own()¶

class numba.cuda.cudadrv.driver.MappedOwnedPointer(memptr, view=None)¶: Bases: numba.cuda.cudadrv.driver.OwnedPointer, mviewbuf.MemAlloc

class numba.cuda.cudadrv.driver.MemoryPointer(context, pointer, size, finalizer=None)¶

Bases: object

device_ctypes_pointer¶

free()¶: Forces the device memory to the trash.

memset(byte, count=None, stream=0)¶

own()¶

view(start, stop=None)¶

class numba.cuda.cudadrv.driver.Module(context, handle, info_log, finalizer=None)¶

Bases: object

get_function(name)¶

get_global_symbol(name)¶

unload()¶

class numba.cuda.cudadrv.driver.OwnedPointer(memptr, view=None)¶: Bases: object

class numba.cuda.cudadrv.driver.PinnedMemory(context, owner, pointer, size, finalizer=None)¶

Bases: mviewbuf.MemAlloc

own()¶

unpin()¶

class numba.cuda.cudadrv.driver.Stream(context, handle, finalizer)¶

Bases: object

auto_synchronize(*args, **kwds)¶

synchronize()¶

class numba.cuda.cudadrv.driver.TrashService(name='unnamed', arg=None)¶

Bases: numba.servicelib.service.Service

We need this to enqueue things to be removed. There are times when you want to disable deallocation because that would break asynchronous work queues.

CLEAN_LIMIT = 20¶

add_trash(item)¶

clear()¶

defer_cleanup(*args, **kwds)¶

process(_arg)¶

numba.cuda.cudadrv.driver.device_ctypes_pointer(obj)¶: Get the ctypes object for the device pointer

numba.cuda.cudadrv.driver.device_extents(devmem)¶

Find the extents (half open begin and end pointer) of the underlying device memory allocation.

NOTE: it always returns the extents of the allocation but the extents of the device memory view that can be a subsection of the entire allocation.

numba.cuda.cudadrv.driver.device_memory_depends(devmem, *objs)¶

Add dependencies to the device memory.

Mainly used for creating structures that points to other device memory, so that the referees are not GC and released.

numba.cuda.cudadrv.driver.device_memory_size(devmem)¶: Check the memory size of the device memory. The result is cached in the device memory object. It may query the driver for the memory size of the device memory allocation.

numba.cuda.cudadrv.driver.device_memset(dst, val, size, stream=0)¶

Memset on the device. If stream is not zero, asynchronous mode is used.

dst: device memory val: byte value to be written size: number of byte to be written stream: a CUDA stream

numba.cuda.cudadrv.driver.device_pointer(obj)¶: Get the device pointer as an integer

numba.cuda.cudadrv.driver.device_pointer_type(devmem)¶: Query the device pointer type: host, device, array, unified?

numba.cuda.cudadrv.driver.device_to_device(dst, src, size, stream=0)¶: NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.

numba.cuda.cudadrv.driver.device_to_host(dst, src, size, stream=0)¶: NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.

numba.cuda.cudadrv.driver.event_elapsed_time(evtstart, evtend)¶

numba.cuda.cudadrv.driver.find_driver()¶

numba.cuda.cudadrv.driver.host_memory_extents(obj)¶: Returns (start, end) the start and end pointer of the array (half open).

numba.cuda.cudadrv.driver.host_memory_size(obj)¶: Get the size of the memory

numba.cuda.cudadrv.driver.host_pointer(obj)¶: NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.

numba.cuda.cudadrv.driver.host_to_device(dst, src, size, stream=0)¶: NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.

numba.cuda.cudadrv.driver.is_device_memory(obj)¶

All CUDA memory object is recognized as an instance with the attribute “__cuda_memory__” defined and its value evaluated to True.

All CUDA memory object should also define an attribute named “device_pointer” which value is an int(or long) object carrying the pointer value of the device memory address. This is not tested in this method.

numba.cuda.cudadrv.driver.launch_kernel(cufunc_handle, griddim, blockdim, sharedmem, hstream, args)¶

numba.cuda.cudadrv.driver.load_module_image(context, image)¶: image must be a pointer

numba.cuda.cudadrv.driver.memory_size_from_info(shape, strides, itemsize)¶: et the byte size of a contiguous memory buffer given the shape, strides and itemsize.

numba.cuda.cudadrv.driver.met_requirement_for_device(device)¶

numba.cuda.cudadrv.driver.profile_start()¶

numba.cuda.cudadrv.driver.profile_stop()¶

numba.cuda.cudadrv.driver.profiling(*args, **kwds)¶: Experimental profiling context.

numba.cuda.cudadrv.driver.require_device_memory(obj)¶: A sentry for methods that accept CUDA memory object.

numba.cuda.cudadrv.drvapi module¶

numba.cuda.cudadrv.enums module¶

Enum values for CUDA driver

numba.cuda.cudadrv.error module¶

exception numba.cuda.cudadrv.error.CudaDriverError¶: Bases: exceptions.Exception

exception numba.cuda.cudadrv.error.CudaSupportError¶: Bases: exceptions.ImportError

exception numba.cuda.cudadrv.error.NvvmError¶: Bases: exceptions.Exception

exception numba.cuda.cudadrv.error.NvvmSupportError¶: Bases: exceptions.ImportError

numba.cuda.cudadrv.libs module¶

numba.cuda.cudadrv.libs.get_cudalib(lib, platform=None)¶

numba.cuda.cudadrv.libs.get_libdevice(arch)¶

numba.cuda.cudadrv.libs.open_cudalib(lib, ccc=False)¶

numba.cuda.cudadrv.libs.open_libdevice(arch)¶

numba.cuda.cudadrv.libs.test(_platform=None)¶

numba.cuda.cudadrv.ndarray module¶

class numba.cuda.cudadrv.ndarray.ArrayHeaderManager¶

Bases: object

Manages array header memory for reusing the allocation.

It allocates one big chunk of memory and partition it for fix sized array header. It currently stores up to 4D array header in 64-bit mode or 8D array header in 32-bit mode.

This allows the small array header allocation to be reused to avoid breaking asynchronous streams and avoid fragmentation of memory.

When run out of preallocated space, it automatically fallback to regular allocation.

allocate(nd)¶

context_map = {}¶

elemsize = 72¶

free(mem)¶

get_stage(*args, **kwds)¶: Get a pagelocked staging area and record the event when we are done.

init(context)¶

maxsize = 1024¶

num_stages = 5¶

write(data, to, stream=0)¶

numba.cuda.cudadrv.ndarray.make_array_ctype(ndim)¶: Create a array header type for a given dimension.

numba.cuda.cudadrv.ndarray.ndarray_device_allocate_data(ary)¶: Allocate gpu data buffer

numba.cuda.cudadrv.ndarray.ndarray_populate_head(gpu_mem, gpu_data, shape, strides, stream=0)¶: Populate the array header

numba.cuda.cudadrv.nvvm module¶

This is a direct translation of nvvm.h

class numba.cuda.cudadrv.nvvm.CompilationUnit¶

Bases: object

add_module(buffer)¶: Add a module level NVVM IR to a compilation unit. - The buffer should contain an NVVM module IR either in the bitcode

representation (LLVM3.0) or in the text representation.

compile(**options)¶

Perform Compliation

The valid compiler options are

-g (enable generation of debugging information)

-opt=

0 (disable optimizations)

3 (default, enable optimizations)

-arch=

compute_20 (default)

compute_30

compute_35

-ftz=

0 (default, preserve denormal values, when performing

single-precision floating-point operations)

1 (flush denormal values to zero, when performing

single-precision floating-point operations)

-prec-sqrt=

0 (use a faster approximation for single-precision

floating-point square root)

1 (default, use IEEE round-to-nearest mode for

single-precision floating-point square root)

-prec-div=

0 (use a faster approximation for single-precision

floating-point division and reciprocals)

1 (default, use IEEE round-to-nearest mode for

single-precision floating-point division and reciprocals)

-fma=

0 (disable FMA contraction)

1 (default, enable FMA contraction)

get_log()¶

class numba.cuda.cudadrv.nvvm.LibDevice(arch)¶

Bases: object

get()¶

class numba.cuda.cudadrv.nvvm.NVVM¶

Bases: object

Process-wide singleton.

check_error(error, msg, exit=False)¶

get_version()¶

numba.cuda.cudadrv.nvvm.fix_data_layout(module)¶

numba.cuda.cudadrv.nvvm.get_arch_option(major, minor)¶: Matches with the closest architecture option

numba.cuda.cudadrv.nvvm.llvm33_to_32_ir(ir)¶: rewrite function attributes in the IR

numba.cuda.cudadrv.nvvm.llvm_to_ptx(llvmir, **opts)¶

numba.cuda.cudadrv.nvvm.set_cuda_kernel(lfunc)¶

Module contents¶

CUDA Driver

Driver API binding
NVVM API binding
Device array implementation

Table Of Contents

This Page

numba.cuda.cudadrv package¶

Submodules¶

numba.cuda.cudadrv.devicearray module¶

numba.cuda.cudadrv.devices module¶

numba.cuda.cudadrv.driver module¶

numba.cuda.cudadrv.drvapi module¶

numba.cuda.cudadrv.enums module¶

numba.cuda.cudadrv.error module¶

numba.cuda.cudadrv.libs module¶

numba.cuda.cudadrv.ndarray module¶

numba.cuda.cudadrv.nvvm module¶

Module contents¶