A CUDA ND Array is recognized by checking the __cuda_memory__ attribute on the object. If it exists and evaluate to True, it must define shape, strides, dtype and size attributes similar to a NumPy ndarray.
Bases: numba.cuda.cudadrv.devicearray.DeviceNDArrayBase
Do __getitem__(item) with CUDA stream
Bases: object
A on GPU NDArray representation
Returns a device memory object that is used as the argument.
Bind a CUDA stream to this object so that all subsequent operation on this array defaults to the given stream.
Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
Copy self to ary or create a new numpy ndarray if ary is None.
Always returns the host array.
Returns the ctypes pointer to the GPU data buffer
Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.
Bases: numba.cuda.cudadrv.devicearray.DeviceNDArrayBase, numpy.ndarray
A host array that uses CUDA mapped memory.
Create a DeviceNDArray object that is like ary.
Check if an object is a CUDA ndarray
Raises ValueError is is_cuda_ndarray(obj) evaluates False
Verify the CUDA ndarray interface for an obj
Expose each GPU devices directly
Bases: object
Proxy into driver.Device
Get the current device or use a device by device number, and return the CUDA context.
A decorator to ensure a context for the CUDA subsystem
CUDA driver bridge implementation
NOTE: The new driver implementation uses a “trashing service” that help prevents a crashing the system (particularly OSX) when the CUDA context is corrupted at resource deallocation. The old approach ties resource management directly into the object destructor; thus, at corruption of the CUDA context, subsequent deallocation could further corrupt the CUDA context and causes the system to freeze in some cases.
Bases: object
This object is tied to the lifetime of the actual context resource.
This object is usually wrapped in a weakref proxy for user. User seldom owns this object.
Returns (free, total) memory in bytes in the context.
Pop context
Push context
Clean up all owned resources in this context
Bases: exceptions.RuntimeError
Bases: object
The device object owns the CUDA contexts. This is owned by the driver object. User should not construct devices directly.
For backward compatibility
Bases: object
Driver API functions are lazily bound.
Returns a list of active devices
Reset all devices
Bases: object
Returns True if all work before the most recent record has completed; otherwise, returns False.
Set the record state of the event at the stream.
Synchronize the host thread for the completion of the event.
All future works submitted to stream will wait util the event completes.
Bases: tuple
FuncAttr(regs, shared, local, const, maxthreads)
Alias for field number 3
Alias for field number 2
Alias for field number 4
Alias for field number 0
Alias for field number 1
Bases: object
Bases: object
Bases: exceptions.RuntimeError
Bases: numba.cuda.cudadrv.driver.MemoryPointer
Bases: numba.cuda.cudadrv.driver.OwnedPointer, mviewbuf.MemAlloc
Bases: object
Forces the device memory to the trash.
Bases: object
Bases: object
Bases: mviewbuf.MemAlloc
Bases: object
Bases: numba.servicelib.service.Service
We need this to enqueue things to be removed. There are times when you want to disable deallocation because that would break asynchronous work queues.
Get the ctypes object for the device pointer
Find the extents (half open begin and end pointer) of the underlying device memory allocation.
NOTE: it always returns the extents of the allocation but the extents of the device memory view that can be a subsection of the entire allocation.
Add dependencies to the device memory.
Mainly used for creating structures that points to other device memory, so that the referees are not GC and released.
Check the memory size of the device memory. The result is cached in the device memory object. It may query the driver for the memory size of the device memory allocation.
Memset on the device. If stream is not zero, asynchronous mode is used.
dst: device memory val: byte value to be written size: number of byte to be written stream: a CUDA stream
Get the device pointer as an integer
Query the device pointer type: host, device, array, unified?
NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.
NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.
Returns (start, end) the start and end pointer of the array (half open).
Get the size of the memory
NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.
NOTE: The underlying data pointer from the host data buffer is used and it should not be changed until the operation which can be asynchronous completes.
All CUDA memory object is recognized as an instance with the attribute “__cuda_memory__” defined and its value evaluated to True.
All CUDA memory object should also define an attribute named “device_pointer” which value is an int(or long) object carrying the pointer value of the device memory address. This is not tested in this method.
image must be a pointer
et the byte size of a contiguous memory buffer given the shape, strides and itemsize.
Experimental profiling context.
A sentry for methods that accept CUDA memory object.
Enum values for CUDA driver
Bases: exceptions.Exception
Bases: exceptions.ImportError
Bases: exceptions.Exception
Bases: exceptions.ImportError
Bases: object
Manages array header memory for reusing the allocation.
It allocates one big chunk of memory and partition it for fix sized array header. It currently stores up to 4D array header in 64-bit mode or 8D array header in 32-bit mode.
This allows the small array header allocation to be reused to avoid breaking asynchronous streams and avoid fragmentation of memory.
When run out of preallocated space, it automatically fallback to regular allocation.
Create a array header type for a given dimension.
Allocate gpu data buffer
Populate the array header
This is a direct translation of nvvm.h
Bases: object
Add a module level NVVM IR to a compilation unit. - The buffer should contain an NVVM module IR either in the bitcode
representation (LLVM3.0) or in the text representation.
Perform Compliation
The valid compiler options are
- -g (enable generation of debugging information)
- -opt=
- 0 (disable optimizations)
- 3 (default, enable optimizations)
- -arch=
- compute_20 (default)
- compute_30
- compute_35
- -ftz=
- 0 (default, preserve denormal values, when performing
- single-precision floating-point operations)
- 1 (flush denormal values to zero, when performing
- single-precision floating-point operations)
- -prec-sqrt=
- 0 (use a faster approximation for single-precision
- floating-point square root)
- 1 (default, use IEEE round-to-nearest mode for
- single-precision floating-point square root)
- -prec-div=
- 0 (use a faster approximation for single-precision
- floating-point division and reciprocals)
- 1 (default, use IEEE round-to-nearest mode for
- single-precision floating-point division and reciprocals)
- -fma=
- 0 (disable FMA contraction)
- 1 (default, enable FMA contraction)
Bases: object
Process-wide singleton.
Matches with the closest architecture option
rewrite function attributes in the IR