numba.cuda.
to_device
(obj, stream=0, copy=True, to=None)¶Allocate and transfer a numpy ndarray or structured scalar to the device.
To copy host->device a numpy array:
ary = np.arange(10)
d_ary = cuda.to_device(ary)
To enqueue the transfer to a stream:
stream = cuda.stream()
d_ary = cuda.to_device(ary, stream=stream)
The resulting d_ary
is a DeviceNDArray
.
To copy device->host:
hary = d_ary.copy_to_host()
To copy device->host to an existing array:
ary = np.empty(shape=d_ary.shape, dtype=d_ary.dtype)
d_ary.copy_to_host(ary)
To enqueue the transfer to a stream:
hary = d_ary.copy_to_host(stream=stream)
numba.cuda.
device_array
(shape, dtype=np.float, strides=None, order='C', stream=0)¶Allocate an empty device ndarray. Similar to numpy.empty()
.
numba.cuda.
device_array_like
(ary, stream=0)¶Call cuda.devicearray() with information from the array.
numba.cuda.
pinned_array
(shape, dtype=np.float, strides=None, order='C')¶Allocate a np.ndarray with a buffer that is pinned (pagelocked). Similar to np.empty().
numba.cuda.
mapped_array
(shape, dtype=np.float, strides=None, order='C', stream=0, portable=False, wc=False)¶Allocate a mapped ndarray with a buffer that is pinned and mapped on to the device. Similar to np.empty()
Parameters: |
|
---|
numba.cuda.
pinned
(*arylist)¶A context manager for temporary pinning a sequence of host ndarrays.
numba.cuda.
mapped
(*arylist, **kws)¶A context manager for temporarily mapping a sequence of host ndarrays.
numba.cuda.cudadrv.devicearray.
DeviceNDArray
(shape, strides, dtype, stream=0, writeback=None, gpu_data=None)¶An on-GPU array type
copy_to_device
(self, ary, stream=0)¶Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
copy_to_host
(self, ary=None, stream=0)¶Copy self
to ary
or create a new Numpy ndarray
if ary
is None
.
If a CUDA stream
is given, then the transfer will be made
asynchronously as part as the given stream. Otherwise, the transfer is
synchronous: the function returns after the copy is finished.
Always returns the host array.
Example:
import numpy as np
from numba import cuda
arr = np.arange(1000)
d_arr = cuda.to_device(arr)
my_kernel[100, 100](d_arr)
result_array = d_arr.copy_to_host()
is_c_contiguous
(self)¶Return true if the array is C-contiguous.
is_f_contiguous
(self)¶Return true if the array is Fortran-contiguous.
ravel
(self, order='C', stream=0)¶Flatten the array without changing its contents, similar to
numpy.ndarray.ravel()
.
reshape
(self, *newshape, **kws)¶Reshape the array without changing its contents, similarly to
numpy.ndarray.reshape()
. Example:
d_arr = d_arr.reshape(20, 50, order='F')
split
(self, section, stream=0)¶Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.
numba.cuda.cudadrv.devicearray.
DeviceRecord
(dtype, stream=0, gpu_data=None)¶An on-GPU record type
copy_to_device
(self, ary, stream=0)¶Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
copy_to_host
(self, ary=None, stream=0)¶Copy self
to ary
or create a new Numpy ndarray
if ary
is None
.
If a CUDA stream
is given, then the transfer will be made
asynchronously as part as the given stream. Otherwise, the transfer is
synchronous: the function returns after the copy is finished.
Always returns the host array.
Example:
import numpy as np
from numba import cuda
arr = np.arange(1000)
d_arr = cuda.to_device(arr)
my_kernel[100, 100](d_arr)
result_array = d_arr.copy_to_host()
numba.cuda.cudadrv.devicearray.
MappedNDArray
(shape, strides, dtype, stream=0, writeback=None, gpu_data=None)¶A host array that uses CUDA mapped memory.
copy_to_device
(self, ary, stream=0)¶Copy ary to self.
If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.
copy_to_host
(self, ary=None, stream=0)¶Copy self
to ary
or create a new Numpy ndarray
if ary
is None
.
If a CUDA stream
is given, then the transfer will be made
asynchronously as part as the given stream. Otherwise, the transfer is
synchronous: the function returns after the copy is finished.
Always returns the host array.
Example:
import numpy as np
from numba import cuda
arr = np.arange(1000)
d_arr = cuda.to_device(arr)
my_kernel[100, 100](d_arr)
result_array = d_arr.copy_to_host()
split
(self, section, stream=0)¶Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.