Even though Numba can automatically transfer NumPy arrays to the device, it can only do so conservatively by always transferring device memory back to the host when a kernel finishes. To avoid the unnecessary transfer for read-only arrays, you can use the following APIs to manually control the transfer:
numba.roc.
device_array
(shape, dtype=np.float, strides=None, order='C')Allocate an empty device ndarray. Similar to numpy.empty()
.
numba.roc.
device_array_like
(ary)Call roc.devicearray() with information from the array.
numba.roc.
to_device
(obj, context, copy=True, to=None)Allocate and transfer a numpy ndarray or structured scalar to the device.
To copy host->device a numpy array:
ary = numpy.arange(10)
d_ary = roc.to_device(ary)
The resulting d_ary
is a DeviceNDArray
.
To copy device->host:
hary = d_ary.copy_to_host()
To copy device->host to an existing array:
ary = numpy.empty(shape=d_ary.shape, dtype=d_ary.dtype)
d_ary.copy_to_host(ary)
Device array references have the following methods. These methods are to be called in host code, not within ROC-jitted functions.
numba.roc.hsadrv.devicearray.
DeviceNDArray
(shape, strides, dtype, dgpu_data=None)An on-dGPU array type
copy_to_host
(self, ary=None, stream=None)Copy self
to ary
or create a new Numpy ndarray
if ary
is None
.
The transfer is synchronous: the function returns after the copy is finished.
Always returns the host array.
Example:
import numpy as np
from numba import hsa
arr = np.arange(1000)
d_arr = hsa.to_device(arr)
my_kernel[100, 100](d_arr)
result_array = d_arr.copy_to_host()
is_c_contiguous
(self)Return true if the array is C-contiguous.
is_f_contiguous
(self)Return true if the array is Fortran-contiguous.
ravel
(self, order='C')Flatten the array without changing its contents, similar to
numpy.ndarray.ravel()
.
reshape
(self, *newshape, **kws)Reshape the array without changing its contents, similarly to
numpy.ndarray.reshape()
. Example:
d_arr = d_arr.reshape(20, 50, order='F')
The CPU and GPU do not share the same main memory, however, it is recommended to register a memory allocation to the HSA runtime for as a performance optimisation hint.
roc.
register
(*arrays)¶Register every given array. The function can be used in a with-context for automically deregistration:
array_a = numpy.arange(10)
array_b = numpy.arange(10)
with roc.register(array_a, array_b):
some_hsa_code(array_a, array_b)
roc.
deregister
(*arrays)¶Deregister every given array
numba.roc.
stream
()ROC streams have the following methods:
numba.roc.hsadrv.driver.
Stream
An asynchronous stream for async API
auto_synchronize
(self)A context manager that waits for all commands in this stream to execute and commits any pending memory transfers upon exiting the context.
synchronize
(self)Synchronize the stream.