numba.cuda.cudadrv package¶
Submodules¶
numba.cuda.cudadrv.devicearray module¶
A CUDA ND Array is recognized by checking the __cuda_memory__ attribute on the object. If it exists and evaluate to True, it must define shape, strides, dtype and size attributes similar to a NumPy ndarray.
- class numba.cuda.cudadrv.devicearray.DeviceNDArray(shape, strides, dtype, stream=0, writeback=None, gpu_head=None, gpu_data=None)¶
Bases: numba.cuda.cudadrv.devicearray.DeviceNDArrayBase
- getitem(item, stream=0)¶
Do __getitem__(item) with CUDA stream
- is_c_contiguous()¶
- is_f_contiguous()¶
- ravel(order='C', stream=0)¶
- class numba.cuda.cudadrv.devicearray.DeviceNDArrayBase(shape, strides, dtype, stream=0, writeback=None, gpu_head=None, gpu_data=None)¶
Bases: object
A on GPU NDArray representation
- as_cuda_arg()¶
Returns a device memory object that is used as the argument.
- bind(stream=0)¶
Bind a CUDA stream to this object so that all subsequent operation on this array defaults to the given stream.
- copy_to_device(ary, stream=0)¶
Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
- copy_to_host(ary=None, stream=0)¶
Copy self to ary or create a new numpy ndarray if ary is None.
Always returns the host array.
- device_ctypes_pointer¶
Returns the ctypes pointer to the GPU data buffer
- split(section, stream=0)¶
Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.
- to_host(stream=0)¶
- class numba.cuda.cudadrv.devicearray.MappedNDArray(shape, strides, dtype, stream=0, writeback=None, gpu_head=None, gpu_data=None)¶
Bases: numba.cuda.cudadrv.devicearray.DeviceNDArrayBase, numpy.ndarray
A host array that uses CUDA mapped memory.
- device_setup(gpu_data, stream=0)¶
- numba.cuda.cudadrv.devicearray.auto_device(ary, stream=0, copy=True)¶
- numba.cuda.cudadrv.devicearray.from_array_like(ary, stream=0, gpu_head=None, gpu_data=None)¶
Create a DeviceNDArray object that is like ary.
- numba.cuda.cudadrv.devicearray.is_cuda_ndarray(obj)¶
Check if an object is a CUDA ndarray
- numba.cuda.cudadrv.devicearray.require_cuda_ndarray(obj)¶
Raises ValueError is is_cuda_ndarray(obj) evaluates False
- numba.cuda.cudadrv.devicearray.sentry_contiguous(ary)¶
- numba.cuda.cudadrv.devicearray.verify_cuda_ndarray_interface(obj)¶
Verify the CUDA ndarray interface for an obj
numba.cuda.cudadrv.devices module¶
Expose each GPU devices directly
- class numba.cuda.cudadrv.devices.GPU(gpu)¶
Bases: object
Proxy into driver.Device. Provides a CUDA runtime like layer. All threads see the same GPU list and shared the same CUDA context.
- associate_context()¶
Associate the context of this GPU to the running thread
- reset()¶
- numba.cuda.cudadrv.devices.get_context(devnum=0)¶
Get the current device or use a device by device number, and return the CUDA context.
- numba.cuda.cudadrv.devices.get_gpu(i)¶
- numba.cuda.cudadrv.devices.require_context(fn)¶
A decorator to ensure a context for the CUDA subsystem
- numba.cuda.cudadrv.devices.reset()¶
numba.cuda.cudadrv.driver module¶
CUDA driver bridge implementation
NOTE: The new driver implementation uses a “trashing service” that help prevents a crashing the system (particularly OSX) when the CUDA context is corrupted at resource deallocation. The old approach ties resource management directly into the object destructor; thus, at corruption of the CUDA context, subsequent deallocation could further corrupt the CUDA context and causes the system to freeze in some cases.
- class numba.cuda.cudadrv.driver.Context(device, handle, finalizer=None)¶
Bases: object
This object is tied to the lifetime of the actual context resource.
This object is usually wrapped in a weakref proxy for user. User seldom owns this object.
- create_event(timing=True)¶
- create_module_image(image)¶
- create_module_ptx(ptx)¶
- create_stream()¶
- get_memory_info()¶
Returns (free, total) memory in bytes in the context.
- memalloc(bytesize)¶
- memfree(pointer)¶
- memhostalloc(bytesize, mapped=False, portable=False, wc=False)¶
- mempin(owner, pointer, size, mapped=False)¶
- memunpin(pointer)¶
- pop()¶
Pop context
- push()¶
Push context
- reset()¶
Clean up all owned resources in this context
- synchronize()¶
- unload_module(module)¶
- exception numba.cuda.cudadrv.driver.CudaAPIError(code, msg)¶
- exception numba.cuda.cudadrv.driver.DeadMemoryError¶
Bases: exceptions.RuntimeError
- class numba.cuda.cudadrv.driver.Device(devnum)¶
Bases: object
The device object owns the CUDA contexts. This is owned by the driver object. User should not construct devices directly.
- COMPUTE_CAPABILITY¶
For backward compatibility
- close_all_context()¶
- create_context()¶
- get_context()¶
- get_or_create_context()¶
- reset()¶
- class numba.cuda.cudadrv.driver.Driver¶
Bases: object
Driver API functions are lazily bound.
- get_device(devnum=0)¶
- get_device_count()¶
- initialize()¶
- is_available¶
- list_devices()¶
Returns a list of active devices
- reset()¶
Reset all devices
- class numba.cuda.cudadrv.driver.Event(context, handle, finalizer=None)¶
Bases: object
- elapsed_time(evtend)¶
- query()¶
Returns True if all work before the most recent record has completed; otherwise, returns False.
- record(stream=0)¶
Set the record state of the event at the stream.
- synchronize()¶
Synchronize the host thread for the completion of the event.
- wait(stream=0)¶
All future works submitted to stream will wait util the event completes.
- class numba.cuda.cudadrv.driver.FuncAttr¶
Bases: tuple
FuncAttr(regs, shared, local, const, maxthreads)
- const¶
Alias for field number 3
- local¶
Alias for field number 2
- maxthreads¶
Alias for field number 4
- regs¶
Alias for field number 0
Alias for field number 1
- class numba.cuda.cudadrv.driver.Function(module, handle, name)¶
Bases: object
- blockdim = (1, 1, 1)¶
- cache_config(prefer_equal=False, prefer_cache=False, prefer_shared=False)¶
- configure(griddim, blockdim, sharedmem=0, stream=0)¶
- device¶
- griddim = (1, 1, 1)¶
- stream = 0¶
- class numba.cuda.cudadrv.driver.Linker¶
Bases: object
- add_file(path, kind)¶
- add_file_guess_ext(path)¶
- add_ptx(ptx, name='<cudapy-ptx>')¶
- complete()¶
- Returns (cubin, size)
- cubin is a pointer to a internal buffer of cubin owned by the linker; thus, it should be loaded before the linker is destroyed.
- error_log¶
- info_log¶
- exception numba.cuda.cudadrv.driver.LinkerError¶
Bases: exceptions.RuntimeError
- class numba.cuda.cudadrv.driver.MappedMemory(context, owner, hostpointer, size, finalizer=None)¶
Bases: numba.cuda.cudadrv.driver.MemoryPointer
- own()¶
- class numba.cuda.cudadrv.driver.MappedOwnedPointer(memptr, view=None)¶
Bases: numba.cuda.cudadrv.driver.OwnedPointer, mviewbuf.MemAlloc
- class numba.cuda.cudadrv.driver.MemoryPointer(context, pointer, size, finalizer=None)¶
Bases: object
- device_ctypes_pointer¶
- free()¶
Forces the device memory to the trash.
- memset(byte, count=None, stream=0)¶
- own()¶
- view(start, stop=None)¶
- class numba.cuda.cudadrv.driver.Module(context, handle, info_log, finalizer=None)¶
Bases: object
- get_function(name)¶
- get_global_symbol(name)¶
- unload()¶
- class numba.cuda.cudadrv.driver.OwnedPointer(memptr, view=None)¶
Bases: object
- class numba.cuda.cudadrv.driver.PinnedMemory(context, owner, pointer, size, finalizer=None)¶
Bases: mviewbuf.MemAlloc
- own()¶
- unpin()¶
- class numba.cuda.cudadrv.driver.Stream(context, handle, finalizer)¶
Bases: object
- auto_synchronize(*args, **kwds)¶
- synchronize()¶
- class numba.cuda.cudadrv.driver.TrashService(name='unnamed', arg=None)¶
Bases: numba.servicelib.service.Service
We need this to enqueue things to be removed. There are times when you want to disable deallocation because that would break asynchronous work queues.
- CLEAN_LIMIT = 20¶
- add_trash(item)¶
- clear()¶
- defer_cleanup(*args, **kwds)¶
- process(_arg)¶
- numba.cuda.cudadrv.driver.device_ctypes_pointer(obj)¶
Get the ctypes object for the device pointer
- numba.cuda.cudadrv.driver.device_extents(devmem)¶
Find the extents (half open begin and end pointer) of the underlying device memory allocation.
NOTE: it always returns the extents of the allocation but the extents of the device memory view that can be a subsection of the entire allocation.
- numba.cuda.cudadrv.driver.device_memory_depends(devmem, *objs)¶
Add dependencies to the device memory.
Mainly used for creating structures that points to other device memory, so that the referees are not GC and released.
- numba.cuda.cudadrv.driver.device_memory_size(devmem)¶
Check the memory size of the device memory. The result is cached in the device memory object. It may query the driver for the memory size of the device memory allocation.
- numba.cuda.cudadrv.driver.device_memset(dst, val, size, stream=0)¶
Memset on the device. If stream is not zero, asynchronous mode is used.
dst: device memory val: byte value to be written size: number of byte to be written stream: a CUDA stream
- numba.cuda.cudadrv.driver.device_pointer(obj)¶
Get the device pointer as an integer
- numba.cuda.cudadrv.driver.device_pointer_type(devmem)¶
Query the device pointer type: host, device, array, unified?
- numba.cuda.cudadrv.driver.device_to_device(dst, src, size, stream=0)¶
NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.
- numba.cuda.cudadrv.driver.device_to_host(dst, src, size, stream=0)¶
NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.
- numba.cuda.cudadrv.driver.event_elapsed_time(evtstart, evtend)¶
- numba.cuda.cudadrv.driver.find_driver()¶
- numba.cuda.cudadrv.driver.host_memory_extents(obj)¶
Returns (start, end) the start and end pointer of the array (half open).
- numba.cuda.cudadrv.driver.host_memory_size(obj)¶
Get the size of the memory
- numba.cuda.cudadrv.driver.host_pointer(obj)¶
NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.
- numba.cuda.cudadrv.driver.host_to_device(dst, src, size, stream=0)¶
NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.
- numba.cuda.cudadrv.driver.is_device_memory(obj)¶
All CUDA memory object is recognized as an instance with the attribute “__cuda_memory__” defined and its value evaluated to True.
All CUDA memory object should also define an attribute named “device_pointer” which value is an int(or long) object carrying the pointer value of the device memory address. This is not tested in this method.
- numba.cuda.cudadrv.driver.launch_kernel(cufunc_handle, griddim, blockdim, sharedmem, hstream, args)¶
- numba.cuda.cudadrv.driver.load_module_image(context, image)¶
image must be a pointer
- numba.cuda.cudadrv.driver.memory_size_from_info(shape, strides, itemsize)¶
et the byte size of a contiguous memory buffer given the shape, strides and itemsize.
- numba.cuda.cudadrv.driver.met_requirement_for_device(device)¶
- numba.cuda.cudadrv.driver.profile_start()¶
- numba.cuda.cudadrv.driver.profile_stop()¶
- numba.cuda.cudadrv.driver.profiling(*args, **kwds)¶
Experimental profiling context.
- numba.cuda.cudadrv.driver.require_device_memory(obj)¶
A sentry for methods that accept CUDA memory object.
numba.cuda.cudadrv.drvapi module¶
numba.cuda.cudadrv.enums module¶
Enum values for CUDA driver
numba.cuda.cudadrv.error module¶
- exception numba.cuda.cudadrv.error.CudaDriverError¶
Bases: exceptions.Exception
- exception numba.cuda.cudadrv.error.CudaSupportError¶
Bases: exceptions.ImportError
- exception numba.cuda.cudadrv.error.NvvmError¶
Bases: exceptions.Exception
- exception numba.cuda.cudadrv.error.NvvmSupportError¶
Bases: exceptions.ImportError
numba.cuda.cudadrv.libs module¶
- numba.cuda.cudadrv.libs.get_cudalib(lib, platform=None)¶
- numba.cuda.cudadrv.libs.get_libdevice(arch)¶
- numba.cuda.cudadrv.libs.open_cudalib(lib, ccc=False)¶
- numba.cuda.cudadrv.libs.open_libdevice(arch)¶
- numba.cuda.cudadrv.libs.test(_platform=None)¶
numba.cuda.cudadrv.ndarray module¶
- class numba.cuda.cudadrv.ndarray.ArrayHeaderManager¶
Bases: object
Manages array header memory for reusing the allocation.
It allocates one big chunk of memory and partition it for fix sized array header. It currently stores up to 4D array header in 64-bit mode or 8D array header in 32-bit mode.
This allows the small array header allocation to be reused to avoid breaking asynchronous streams and avoid fragmentation of memory.
When run out of preallocated space, it automatically fallback to regular allocation.
- allocate(nd)¶
- context_map = {}¶
- elemsize = 96¶
- free(mem)¶
- get_stage(*args, **kwds)¶
Get a pagelocked staging area and record the event when we are done.
- init(context)¶
- maxsize = 1024¶
- num_stages = 5¶
- write(data, to, stream=0)¶
- numba.cuda.cudadrv.ndarray.ndarray_device_allocate_data(ary)¶
Allocate gpu data buffer
- numba.cuda.cudadrv.ndarray.ndarray_populate_head(gpu_mem, gpu_data, shape, strides, stream=0)¶
Populate the array header
numba.cuda.cudadrv.nvvm module¶
This is a direct translation of nvvm.h
- class numba.cuda.cudadrv.nvvm.CompilationUnit¶
Bases: object
- add_module(buffer)¶
Add a module level NVVM IR to a compilation unit. - The buffer should contain an NVVM module IR either in the bitcode
representation (LLVM3.0) or in the text representation.
- compile(**options)¶
Perform Compliation
The valid compiler options are
- -g (enable generation of debugging information)
- -opt=
- 0 (disable optimizations)
- 3 (default, enable optimizations)
- -arch=
- compute_20 (default)
- compute_30
- compute_35
- -ftz=
- 0 (default, preserve denormal values, when performing
- single-precision floating-point operations)
- 1 (flush denormal values to zero, when performing
- single-precision floating-point operations)
- -prec-sqrt=
- 0 (use a faster approximation for single-precision
- floating-point square root)
- 1 (default, use IEEE round-to-nearest mode for
- single-precision floating-point square root)
- -prec-div=
- 0 (use a faster approximation for single-precision
- floating-point division and reciprocals)
- 1 (default, use IEEE round-to-nearest mode for
- single-precision floating-point division and reciprocals)
- -fma=
- 0 (disable FMA contraction)
- 1 (default, enable FMA contraction)
- get_log()¶
- class numba.cuda.cudadrv.nvvm.NVVM¶
Bases: object
Process-wide singleton.
- check_error(error, msg, exit=False)¶
- get_version()¶
- numba.cuda.cudadrv.nvvm.fix_data_layout(module)¶
- numba.cuda.cudadrv.nvvm.get_arch_option(major, minor)¶
Matches with the closest architecture option
- numba.cuda.cudadrv.nvvm.llvm33_to_32_ir(ir)¶
rewrite function attributes in the IR
- numba.cuda.cudadrv.nvvm.llvm_to_ptx(llvmir, **opts)¶
- numba.cuda.cudadrv.nvvm.set_cuda_kernel(lfunc)¶