Writing a reduction algorithm for CUDA GPU can be tricky. Numba provides a
@reduce
decorator for converting a simple binary operation into a reduction
kernel.
@reduce
¶Example:
import numpy
from numba import cuda
@cuda.reduce
def sum_reduce(a, b):
return a + b
A = (numpy.arange(1234, dtype=numpy.float64)) + 1
expect = A.sum() # numpy sum reduction
got = sum_reduce(A) # cuda sum reduction
assert expect == got
Lambda functions can also be used here:
sum_reduce = cuda.reduce(lambda a, b: a + b)
The reduce
decorator creates an instance of the Reduce
class.
(Currently, reduce
is an alias to Reduce
, but this behavior is not
guaranteed.)
numba.cuda.
Reduce
(functor)¶__call__
(self, arr, size=None, res=None, init=0, stream=0)¶Performs a full reduction.
Parameters: |
|
---|---|
Returns: | If |
__init__
(self, functor)¶Create a reduction object that reduces values using a given binary function. The binary function is compiled once and cached inside this object. Keeping this object alive will prevent re-compilation.
Parameters: | binop – A function to be compiled as a CUDA device function that
will be used as the binary operation for reduction on a
CUDA device. Internally, it is compiled using
cuda.jit(device=True) . |
---|