4.3. Memory Management

numba.cuda.to_device(obj, stream=0, copy=True, to=None)

Allocate and transfer a numpy ndarray or structured scalar to the device.

To copy host->device a numpy array:

ary = np.arange(10)
d_ary = cuda.to_device(ary)

To enqueue the transfer to a stream:

stream = cuda.stream()
d_ary = cuda.to_device(ary, stream=stream)

The resulting d_ary is a DeviceNDArray.

To copy device->host:

hary = d_ary.copy_to_host()

To copy device->host to an existing array:

ary = np.empty(shape=d_ary.shape, dtype=d_ary.dtype)

To enqueue the transfer to a stream:

hary = d_ary.copy_to_host(stream=stream)
numba.cuda.device_array(shape, dtype=np.float, strides=None, order='C', stream=0)

Allocate an empty device ndarray. Similar to numpy.empty().

numba.cuda.device_array_like(ary, stream=0)

Call cuda.devicearray() with information from the array.

numba.cuda.pinned_array(shape, dtype=np.float, strides=None, order='C')

Allocate a np.ndarray with a buffer that is pinned (pagelocked). Similar to np.empty().

numba.cuda.mapped_array(shape, dtype=np.float, strides=None, order='C', stream=0, portable=False, wc=False)

Allocate a mapped ndarray with a buffer that is pinned and mapped on to the device. Similar to np.empty()

  • portable – a boolean flag to allow the allocated device memory to be usable in multiple devices.
  • wc – a boolean flag to enable writecombined allocation which is faster to write by the host and to read by the device, but slower to write by the host and slower to write by the device.
numba.cuda.pinned(*args, **kws)

A context manager for temporary pinning a sequence of host ndarrays.

numba.cuda.mapped(*args, **kws)

A context manager for temporarily mapping a sequence of host ndarrays.

4.3.1. Device Objects

class numba.cuda.cudadrv.devicearray.DeviceNDArray(shape, strides, dtype, stream=0, writeback=None, gpu_data=None)

An on-GPU array type

copy_to_device(ary, stream=0)

Copy ary to self.

If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.

copy_to_host(ary=None, stream=0)

Copy self to ary or create a new Numpy ndarray if ary is None.

If a CUDA stream is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.

Always returns the host array.


import numpy as np
from numba import cuda

arr = np.arange(1000)
d_arr = cuda.to_device(arr)

my_kernel[100, 100](d_arr)

result_array = d_arr.copy_to_host()

Return true if the array is C-contiguous.


Return true if the array is Fortran-contiguous.

ravel(order='C', stream=0)

Flatten the array without changing its contents, similar to numpy.ndarray.ravel().

reshape(*newshape, **kws)

Reshape the array without changing its contents, similarly to numpy.ndarray.reshape(). Example:

d_arr = d_arr.reshape(20, 50, order='F')
split(section, stream=0)

Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.

class numba.cuda.cudadrv.devicearray.DeviceRecord(dtype, stream=0, gpu_data=None)

An on-GPU record type

copy_to_device(ary, stream=0)

Copy ary to self.

If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.

copy_to_host(ary=None, stream=0)

Copy self to ary or create a new Numpy ndarray if ary is None.

If a CUDA stream is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.

Always returns the host array.


import numpy as np
from numba import cuda

arr = np.arange(1000)
d_arr = cuda.to_device(arr)

my_kernel[100, 100](d_arr)

result_array = d_arr.copy_to_host()
class numba.cuda.cudadrv.devicearray.MappedNDArray(shape, strides, dtype, stream=0, writeback=None, gpu_data=None)

A host array that uses CUDA mapped memory.

copy_to_device(ary, stream=0)

Copy ary to self.

If ary is a CUDA memory, perform a device-to-device transfer. Otherwise, perform a a host-to-device transfer.

copy_to_host(ary=None, stream=0)

Copy self to ary or create a new Numpy ndarray if ary is None.

If a CUDA stream is given, then the transfer will be made asynchronously as part as the given stream. Otherwise, the transfer is synchronous: the function returns after the copy is finished.

Always returns the host array.


import numpy as np
from numba import cuda

arr = np.arange(1000)
d_arr = cuda.to_device(arr)

my_kernel[100, 100](d_arr)

result_array = d_arr.copy_to_host()
split(section, stream=0)

Split the array into equal partition of the section size. If the array cannot be equally divided, the last section will be smaller.