4.3. Memory Management¶
-
numba.cuda.
to_device
(obj, stream=0, copy=True, to=None)¶ Allocate and transfer a numpy ndarray or structured scalar to the device.
To copy host->device a numpy array:
ary = np.arange(10) d_ary = cuda.to_device(ary)
To enqueue the transfer to a stream:
stream = cuda.stream() d_ary = cuda.to_device(ary, stream=stream)
The resulting
d_ary
is aDeviceNDArray
.To copy device->host:
hary = d_ary.copy_to_host()
To copy device->host to an existing array:
ary = np.empty(shape=d_ary.shape, dtype=d_ary.dtype) d_ary.copy_to_host(ary)
To enqueue the transfer to a stream:
hary = d_ary.copy_to_host(stream=stream)
-
numba.cuda.
device_array
(shape, dtype=np.float, strides=None, order='C', stream=0)¶ Allocate an empty device ndarray. Similar to
numpy.empty()
.
-
numba.cuda.
device_array_like
(ary, stream=0)¶ Call cuda.devicearray() with information from the array.
-
numba.cuda.
pinned_array
(shape, dtype=np.float, strides=None, order='C')¶ Allocate a np.ndarray with a buffer that is pinned (pagelocked). Similar to np.empty().
-
numba.cuda.
mapped_array
(shape, dtype=np.float, strides=None, order='C', stream=0, portable=False, wc=False)¶ Allocate a mapped ndarray with a buffer that is pinned and mapped on to the device. Similar to np.empty()
Parameters: - portable – a boolean flag to allow the allocated device memory to be usable in multiple devices.
- wc – a boolean flag to enable writecombined allocation which is faster to write by the host and to read by the device, but slower to write by the host and slower to write by the device.
-
numba.cuda.
pinned
(*arylist)¶ A context manager for temporary pinning a sequence of host ndarrays.
-
numba.cuda.
mapped
(*arylist, **kws)¶ A context manager for temporarily mapping a sequence of host ndarrays.
4.3.1. Device Objects¶
-
class
numba.cuda.cudadrv.devicearray.
DeviceNDArray
(shape, strides, dtype, stream=0, writeback=None, gpu_data=None)¶ An on-GPU array type
-
copy_to_device
(ary, stream=0)¶ Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
-
copy_to_host
(ary=None, stream=0)¶ Copy
self
toary
or create a new Numpy ndarray ifary
isNone
.If a CUDA
stream
is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.Always returns the host array.
Example:
import numpy as np from numba import cuda arr = np.arange(1000) d_arr = cuda.to_device(arr) my_kernel[100, 100](d_arr) result_array = d_arr.copy_to_host()
-
is_c_contiguous
()¶ Return true if the array is C-contiguous.
-
is_f_contiguous
()¶ Return true if the array is Fortran-contiguous.
-
ravel
(order='C', stream=0)¶ Flatten the array without changing its contents, similar to
numpy.ndarray.ravel()
.
-
reshape
(*newshape, **kws)¶ Reshape the array without changing its contents, similarly to
numpy.ndarray.reshape()
. Example:d_arr = d_arr.reshape(20, 50, order='F')
-
split
(section, stream=0)¶ Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.
-
-
class
numba.cuda.cudadrv.devicearray.
DeviceRecord
(dtype, stream=0, gpu_data=None)¶ An on-GPU record type
-
copy_to_device
(ary, stream=0)¶ Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
-
copy_to_host
(ary=None, stream=0)¶ Copy
self
toary
or create a new Numpy ndarray ifary
isNone
.If a CUDA
stream
is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.Always returns the host array.
Example:
import numpy as np from numba import cuda arr = np.arange(1000) d_arr = cuda.to_device(arr) my_kernel[100, 100](d_arr) result_array = d_arr.copy_to_host()
-
-
class
numba.cuda.cudadrv.devicearray.
MappedNDArray
(shape, strides, dtype, stream=0, writeback=None, gpu_data=None)¶ A host array that uses CUDA mapped memory.
-
copy_to_device
(ary, stream=0)¶ Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
-
copy_to_host
(ary=None, stream=0)¶ Copy
self
toary
or create a new Numpy ndarray ifary
isNone
.If a CUDA
stream
is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.Always returns the host array.
Example:
import numpy as np from numba import cuda arr = np.arange(1000) d_arr = cuda.to_device(arr) my_kernel[100, 100](d_arr) result_array = d_arr.copy_to_host()
-
split
(section, stream=0)¶ Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.
-