numba.cuda.cudadrv package

Submodules

numba.cuda.cudadrv.devicearray module

A CUDA ND Array is recognized by checking the __cuda_memory__ attribute on the object. If it exists and evaluate to True, it must define shape, strides, dtype and size attributes similar to a NumPy ndarray.

class numba.cuda.cudadrv.devicearray.DeviceNDArray(shape, strides, dtype, stream=0, writeback=None, gpu_head=None, gpu_data=None)

Bases: numba.cuda.cudadrv.devicearray.DeviceNDArrayBase

getitem(item, stream=0)

Do __getitem__(item) with CUDA stream

is_c_contiguous()
is_f_contiguous()
ravel(order='C', stream=0)
reshape(*newshape, **kws)

reshape(self, *newshape, order=’C’):

Reshape the array and keeping the original data

class numba.cuda.cudadrv.devicearray.DeviceNDArrayBase(shape, strides, dtype, stream=0, writeback=None, gpu_head=None, gpu_data=None)

Bases: object

A on GPU NDArray representation

as_cuda_arg()

Returns a device memory object that is used as the argument.

bind(stream=0)

Bind a CUDA stream to this object so that all subsequent operation on this array defaults to the given stream.

copy_to_device(ary, stream=0)

Copy ary to self.

If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.

copy_to_host(ary=None, stream=0)

Copy self to ary or create a new numpy ndarray if ary is None.

Always returns the host array.

device_ctypes_pointer

Returns the ctypes pointer to the GPU data buffer

split(section, stream=0)

Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.

to_host(stream=0)
class numba.cuda.cudadrv.devicearray.MappedNDArray(shape, strides, dtype, stream=0, writeback=None, gpu_head=None, gpu_data=None)

Bases: numba.cuda.cudadrv.devicearray.DeviceNDArrayBase, numpy.ndarray

A host array that uses CUDA mapped memory.

device_setup(gpu_data, stream=0)
numba.cuda.cudadrv.devicearray.auto_device(ary, stream=0, copy=True)
numba.cuda.cudadrv.devicearray.from_array_like(ary, stream=0, gpu_head=None, gpu_data=None)

Create a DeviceNDArray object that is like ary.

numba.cuda.cudadrv.devicearray.is_cuda_ndarray(obj)

Check if an object is a CUDA ndarray

numba.cuda.cudadrv.devicearray.require_cuda_ndarray(obj)

Raises ValueError is is_cuda_ndarray(obj) evaluates False

numba.cuda.cudadrv.devicearray.sentry_contiguous(ary)
numba.cuda.cudadrv.devicearray.verify_cuda_ndarray_interface(obj)

Verify the CUDA ndarray interface for an obj

numba.cuda.cudadrv.devices module

Expose each GPU devices directly

class numba.cuda.cudadrv.devices.GPU(gpu)

Bases: object

Proxy into driver.Device. Provides a CUDA runtime like layer. All threads see the same GPU list and shared the same CUDA context.

associate_context()

Associate the context of this GPU to the running thread

reset()
numba.cuda.cudadrv.devices.get_context(devnum=0)

Get the current device or use a device by device number, and return the CUDA context.

numba.cuda.cudadrv.devices.get_gpu(i)
numba.cuda.cudadrv.devices.require_context(fn)

A decorator to ensure a context for the CUDA subsystem

numba.cuda.cudadrv.devices.reset()

numba.cuda.cudadrv.driver module

CUDA driver bridge implementation

NOTE: The new driver implementation uses a “trashing service” that help prevents a crashing the system (particularly OSX) when the CUDA context is corrupted at resource deallocation. The old approach ties resource management directly into the object destructor; thus, at corruption of the CUDA context, subsequent deallocation could further corrupt the CUDA context and causes the system to freeze in some cases.

class numba.cuda.cudadrv.driver.Context(device, handle, finalizer=None)

Bases: object

This object is tied to the lifetime of the actual context resource.

This object is usually wrapped in a weakref proxy for user. User seldom owns this object.

create_event(timing=True)
create_module_image(image)
create_module_ptx(ptx)
create_stream()
get_memory_info()

Returns (free, total) memory in bytes in the context.

memalloc(bytesize)
memfree(pointer)
memhostalloc(bytesize, mapped=False, portable=False, wc=False)
mempin(owner, pointer, size, mapped=False)
memunpin(pointer)
pop()

Pop context

push()

Push context

reset()

Clean up all owned resources in this context

synchronize()
unload_module(module)
exception numba.cuda.cudadrv.driver.CudaAPIError(code, msg)

Bases: numba.cuda.cudadrv.error.CudaDriverError

exception numba.cuda.cudadrv.driver.DeadMemoryError

Bases: exceptions.RuntimeError

class numba.cuda.cudadrv.driver.Device(devnum)

Bases: object

The device object owns the CUDA contexts. This is owned by the driver object. User should not construct devices directly.

COMPUTE_CAPABILITY

For backward compatibility

close_all_context()
create_context()
get_context()
get_or_create_context()
reset()
class numba.cuda.cudadrv.driver.Driver

Bases: object

Driver API functions are lazily bound.

get_device(devnum=0)
get_device_count()
initialize()
is_available
list_devices()

Returns a list of active devices

reset()

Reset all devices

class numba.cuda.cudadrv.driver.Event(context, handle, finalizer=None)

Bases: object

elapsed_time(evtend)
query()

Returns True if all work before the most recent record has completed; otherwise, returns False.

record(stream=0)

Set the record state of the event at the stream.

synchronize()

Synchronize the host thread for the completion of the event.

wait(stream=0)

All future works submitted to stream will wait util the event completes.

class numba.cuda.cudadrv.driver.FuncAttr

Bases: tuple

FuncAttr(regs, shared, local, const, maxthreads)

const

Alias for field number 3

local

Alias for field number 2

maxthreads

Alias for field number 4

regs

Alias for field number 0

shared

Alias for field number 1

class numba.cuda.cudadrv.driver.Function(module, handle, name)

Bases: object

blockdim = (1, 1, 1)
cache_config(prefer_equal=False, prefer_cache=False, prefer_shared=False)
configure(griddim, blockdim, sharedmem=0, stream=0)
device
griddim = (1, 1, 1)
sharedmem = 0
stream = 0
class numba.cuda.cudadrv.driver.Linker

Bases: object

add_file(path, kind)
add_file_guess_ext(path)
add_ptx(ptx, name='<cudapy-ptx>')
complete()
Returns (cubin, size)
cubin is a pointer to a internal buffer of cubin owned by the linker; thus, it should be loaded before the linker is destroyed.
error_log
info_log
exception numba.cuda.cudadrv.driver.LinkerError

Bases: exceptions.RuntimeError

class numba.cuda.cudadrv.driver.MappedMemory(context, owner, hostpointer, size, finalizer=None)

Bases: numba.cuda.cudadrv.driver.MemoryPointer

own()
class numba.cuda.cudadrv.driver.MappedOwnedPointer(memptr, view=None)

Bases: numba.cuda.cudadrv.driver.OwnedPointer, mviewbuf.MemAlloc

class numba.cuda.cudadrv.driver.MemoryPointer(context, pointer, size, finalizer=None)

Bases: object

device_ctypes_pointer
free()

Forces the device memory to the trash.

memset(byte, count=None, stream=0)
own()
view(start, stop=None)
class numba.cuda.cudadrv.driver.Module(context, handle, info_log, finalizer=None)

Bases: object

get_function(name)
get_global_symbol(name)
unload()
class numba.cuda.cudadrv.driver.OwnedPointer(memptr, view=None)

Bases: object

class numba.cuda.cudadrv.driver.PinnedMemory(context, owner, pointer, size, finalizer=None)

Bases: mviewbuf.MemAlloc

own()
unpin()
class numba.cuda.cudadrv.driver.Stream(context, handle, finalizer)

Bases: object

auto_synchronize(*args, **kwds)
synchronize()
class numba.cuda.cudadrv.driver.TrashService(name='unnamed', arg=None)

Bases: numba.servicelib.service.Service

We need this to enqueue things to be removed. There are times when you want to disable deallocation because that would break asynchronous work queues.

CLEAN_LIMIT = 20
add_trash(item)
clear()
defer_cleanup(*args, **kwds)
process(_arg)
numba.cuda.cudadrv.driver.device_ctypes_pointer(obj)

Get the ctypes object for the device pointer

numba.cuda.cudadrv.driver.device_extents(devmem)

Find the extents (half open begin and end pointer) of the underlying device memory allocation.

NOTE: it always returns the extents of the allocation but the extents of the device memory view that can be a subsection of the entire allocation.

numba.cuda.cudadrv.driver.device_memory_depends(devmem, *objs)

Add dependencies to the device memory.

Mainly used for creating structures that points to other device memory, so that the referees are not GC and released.

numba.cuda.cudadrv.driver.device_memory_size(devmem)

Check the memory size of the device memory. The result is cached in the device memory object. It may query the driver for the memory size of the device memory allocation.

numba.cuda.cudadrv.driver.device_memset(dst, val, size, stream=0)

Memset on the device. If stream is not zero, asynchronous mode is used.

dst: device memory val: byte value to be written size: number of byte to be written stream: a CUDA stream

numba.cuda.cudadrv.driver.device_pointer(obj)

Get the device pointer as an integer

numba.cuda.cudadrv.driver.device_pointer_type(devmem)

Query the device pointer type: host, device, array, unified?

numba.cuda.cudadrv.driver.device_to_device(dst, src, size, stream=0)

NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.

numba.cuda.cudadrv.driver.device_to_host(dst, src, size, stream=0)

NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.

numba.cuda.cudadrv.driver.event_elapsed_time(evtstart, evtend)
numba.cuda.cudadrv.driver.find_driver()
numba.cuda.cudadrv.driver.host_memory_extents(obj)

Returns (start, end) the start and end pointer of the array (half open).

numba.cuda.cudadrv.driver.host_memory_size(obj)

Get the size of the memory

numba.cuda.cudadrv.driver.host_pointer(obj)

NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.

numba.cuda.cudadrv.driver.host_to_device(dst, src, size, stream=0)

NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.

numba.cuda.cudadrv.driver.is_device_memory(obj)

All CUDA memory object is recognized as an instance with the attribute “__cuda_memory__” defined and its value evaluated to True.

All CUDA memory object should also define an attribute named “device_pointer” which value is an int(or long) object carrying the pointer value of the device memory address. This is not tested in this method.

numba.cuda.cudadrv.driver.launch_kernel(cufunc_handle, griddim, blockdim, sharedmem, hstream, args)
numba.cuda.cudadrv.driver.load_module_image(context, image)

image must be a pointer

numba.cuda.cudadrv.driver.memory_size_from_info(shape, strides, itemsize)

et the byte size of a contiguous memory buffer given the shape, strides and itemsize.

numba.cuda.cudadrv.driver.met_requirement_for_device(device)
numba.cuda.cudadrv.driver.profile_start()
numba.cuda.cudadrv.driver.profile_stop()
numba.cuda.cudadrv.driver.profiling(*args, **kwds)

Experimental profiling context.

numba.cuda.cudadrv.driver.require_device_memory(obj)

A sentry for methods that accept CUDA memory object.

numba.cuda.cudadrv.drvapi module

numba.cuda.cudadrv.enums module

Enum values for CUDA driver

numba.cuda.cudadrv.error module

exception numba.cuda.cudadrv.error.CudaDriverError

Bases: exceptions.Exception

exception numba.cuda.cudadrv.error.CudaSupportError

Bases: exceptions.ImportError

exception numba.cuda.cudadrv.error.NvvmError

Bases: exceptions.Exception

exception numba.cuda.cudadrv.error.NvvmSupportError

Bases: exceptions.ImportError

numba.cuda.cudadrv.libs module

numba.cuda.cudadrv.libs.get_cudalib(lib, platform=None)
numba.cuda.cudadrv.libs.get_libdevice(arch)
numba.cuda.cudadrv.libs.open_cudalib(lib, ccc=False)
numba.cuda.cudadrv.libs.open_libdevice(arch)
numba.cuda.cudadrv.libs.test(_platform=None)

numba.cuda.cudadrv.ndarray module

class numba.cuda.cudadrv.ndarray.ArrayHeaderManager

Bases: object

Manages array header memory for reusing the allocation.

It allocates one big chunk of memory and partition it for fix sized array header. It currently stores up to 4D array header in 64-bit mode or 8D array header in 32-bit mode.

This allows the small array header allocation to be reused to avoid breaking asynchronous streams and avoid fragmentation of memory.

When run out of preallocated space, it automatically fallback to regular allocation.

allocate(nd)
context_map = {}
elemsize = 96
free(mem)
get_stage(*args, **kwds)

Get a pagelocked staging area and record the event when we are done.

init(context)
maxsize = 1024
num_stages = 5
write(data, to, stream=0)
numba.cuda.cudadrv.ndarray.ndarray_device_allocate_data(ary)

Allocate gpu data buffer

numba.cuda.cudadrv.ndarray.ndarray_populate_head(gpu_mem, gpu_data, shape, strides, stream=0)

Populate the array header

numba.cuda.cudadrv.nvvm module

This is a direct translation of nvvm.h

class numba.cuda.cudadrv.nvvm.CompilationUnit

Bases: object

add_module(buffer)

Add a module level NVVM IR to a compilation unit. - The buffer should contain an NVVM module IR either in the bitcode

representation (LLVM3.0) or in the text representation.
compile(**options)

Perform Compliation

The valid compiler options are

    • -g (enable generation of debugging information)
    • -opt=
    • 0 (disable optimizations)
    • 3 (default, enable optimizations)
    • -arch=
    • compute_20 (default)
    • compute_30
    • compute_35
    • -ftz=
    • 0 (default, preserve denormal values, when performing
  • single-precision floating-point operations)
    • 1 (flush denormal values to zero, when performing
  • single-precision floating-point operations)
    • -prec-sqrt=
    • 0 (use a faster approximation for single-precision
  • floating-point square root)
    • 1 (default, use IEEE round-to-nearest mode for
  • single-precision floating-point square root)
    • -prec-div=
    • 0 (use a faster approximation for single-precision
  • floating-point division and reciprocals)
    • 1 (default, use IEEE round-to-nearest mode for
  • single-precision floating-point division and reciprocals)
    • -fma=
    • 0 (disable FMA contraction)
    • 1 (default, enable FMA contraction)
get_log()
class numba.cuda.cudadrv.nvvm.LibDevice(arch)

Bases: object

get()
class numba.cuda.cudadrv.nvvm.NVVM

Bases: object

Process-wide singleton.

check_error(error, msg, exit=False)
get_version()
numba.cuda.cudadrv.nvvm.fix_data_layout(module)
numba.cuda.cudadrv.nvvm.get_arch_option(major, minor)

Matches with the closest architecture option

numba.cuda.cudadrv.nvvm.llvm33_to_32_ir(ir)

rewrite function attributes in the IR

numba.cuda.cudadrv.nvvm.llvm_to_ptx(llvmir, **opts)
numba.cuda.cudadrv.nvvm.set_cuda_kernel(lfunc)

Module contents

CUDA Driver

  • Driver API binding
  • NVVM API binding
  • Device array implementation