Writing a reduction algorithm for CUDA GPU can be tricky. Numba provides a
@reduce decorator for converting a simple binary operation into a reduction
kernel.
@reduce¶Example:
import numpy
from numba import cuda
@cuda.reduce
def sum_reduce(a, b):
return a + b
A = (numpy.arange(1234, dtype=numpy.float64)) + 1
expect = A.sum() # numpy sum reduction
got = sum_reduce(A) # cuda sum reduction
assert expect == got
Lambda functions can also be used here:
sum_reduce = cuda.reduce(lambda a, b: a + b)
The reduce decorator creates an instance of the Reduce class.
(Currently, reduce is an alias to Reduce, but this behavior is not
guaranteed.)
numba.cuda.Reduce(functor)¶__call__(arr, size=None, res=None, init=0, stream=0)¶Performs a full reduction.
| Parameters: |
|
|---|---|
| Returns: | If |
__init__(functor)¶Create a reduction object that reduces values using a given binary function. The binary function is compiled once and cached inside this object. Keeping this object alive will prevent re-compilation.
| Parameters: | binop – A function to be compiled as a CUDA device function that
will be used as the binary operation for reduction on a
CUDA device. Internally, it is compiled using
cuda.jit(device=True). |
|---|