3.3. Memory management¶
3.3.1. Data transfer¶
Even though Numba can automatically transfer NumPy arrays to the device, it can only do so conservatively by always transferring device memory back to the host when a kernel finishes. To avoid the unnecessary transfer for read-only arrays, you can use the following APIs to manually control the transfer:
-
numba.cuda.
device_array
(shape, dtype=np.float, strides=None, order='C', stream=0) Allocate an empty device ndarray. Similar to
numpy.empty()
.
-
numba.cuda.
device_array_like
(ary, stream=0) Call cuda.devicearray() with information from the array.
-
numba.cuda.
to_device
(obj, stream=0, copy=True, to=None) Allocate and transfer a numpy ndarray or structured scalar to the device.
To copy host->device a numpy array:
ary = numpy.arange(10) d_ary = cuda.to_device(ary)
To enqueue the transfer to a stream:
stream = cuda.stream() d_ary = cuda.to_device(ary, stream=stream)
The resulting
d_ary
is aDeviceNDArray
.To copy device->host:
hary = d_ary.copy_to_host()
To copy device->host to an existing array:
ary = numpy.empty(shape=d_ary.shape, dtype=d_ary.dtype) d_ary.copy_to_host(ary)
To enqueue the transfer to a stream:
hary = d_ary.copy_to_host(stream=stream)
3.3.1.1. Device arrays¶
Device array references have the following methods. These methods are to be called in host code, not within CUDA-jitted functions.
-
class
numba.cuda.cudadrv.devicearray.
DeviceNDArray
(shape, strides, dtype, stream=0, writeback=None, gpu_data=None) An on-GPU array type
-
copy_to_host
(ary=None, stream=0) Copy
self
toary
or create a new Numpy ndarray ifary
isNone
.If a CUDA
stream
is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.Always returns the host array.
Example:
import numpy as np from numba import cuda arr = np.arange(1000) d_arr = cuda.to_device(arr) my_kernel[100, 100](d_arr) result_array = d_arr.copy_to_host()
-
is_c_contiguous
() Return true if the array is C-contiguous.
-
is_f_contiguous
() Return true if the array is Fortran-contiguous.
-
ravel
(order='C', stream=0) Flatten the array without changing its contents, similar to
numpy.ndarray.ravel()
.
-
reshape
(*newshape, **kws) Reshape the array without changing its contents, similarly to
numpy.ndarray.reshape()
. Example:d_arr = d_arr.reshape(20, 50, order='F')
-
3.3.2. Pinned memory¶
-
numba.cuda.
pinned
(*arylist) A context manager for temporary pinning a sequence of host ndarrays.
-
numba.cuda.
pinned_array
(shape, dtype=np.float, strides=None, order='C') Allocate a numpy.ndarray with a buffer that is pinned (pagelocked). Similar to numpy.empty().
3.3.3. Streams¶
-
numba.cuda.
stream
() Create a CUDA stream that represents a command queue for the device.
CUDA streams have the following methods:
-
class
numba.cuda.cudadrv.driver.
Stream
(context, handle, finalizer) -
auto_synchronize
() A context manager that waits for all commands in this stream to execute and commits any pending memory transfers upon exiting the context.
-
synchronize
() Wait for all commands in this stream to execute. This will commit any pending memory transfers.
-
3.3.5. Local memory¶
Local memory is an area of memory private to each thread. Using local memory helps allocate some scratchpad area when scalar local variables are not enough. The memory is allocated once for the duration of the kernel, unlike traditional dynamic memory management.
-
numba.cuda.local.
array
(shape, type) Allocate a local array of the given shape and type on the device. The array is private to the current thread. An array-like object is returned which can be read and written to like any standard array (e.g. through indexing).