Writing a reduction algorithm for CUDA GPU can be tricky. Numba provides a
@reduce decorator for converting simple binary operation into a reduction
import numpy from numba import cuda @cuda.reduce def sum_reduce(a, b): return a + b A = (numpy.arange(1234, dtype=numpy.float64)) + 1 expect = A.sum() # numpy sum reduction got = sum_reduce(A) # cuda sum reduction assert expect == got
User can also use a lambda function:
sum_reduce = cuda.reduce(lambda a, b: a + b)
reduce decorator creates an instance of the
reduce is an alias to
Reduce, but this behavior is not
__call__(arr, size=None, res=None, init=0, stream=0)¶
Performs a full reduction.
Create a reduction object that reduces values using a given binary function. The binary function is compiled once and cached inside this object. Keeping this object alive will prevent re-compilation.
|Parameters:||binop – A function to be compiled as a CUDA device function that
will be used as the binary operation for reduction on a
CUDA device. Internally, it is compiled using