GPU Reduction
==============

Writing a reduction algorithm for CUDA GPU can be tricky.  Numba provides a
``@reduce`` decorator for converting a simple binary operation into a reduction
kernel. An example follows::

    import numpy
    from numba import cuda

    @cuda.reduce
    def sum_reduce(a, b):
        return a + b

    A = (numpy.arange(1234, dtype=numpy.float64)) + 1
    expect = A.sum()      # numpy sum reduction
    got = sum_reduce(A)   # cuda sum reduction
    assert expect == got

Lambda functions can also be used here::

    sum_reduce = cuda.reduce(lambda a, b: a + b)

The Reduce class
----------------

The ``reduce`` decorator creates an instance of the ``Reduce`` class.
Currently, ``reduce`` is an alias to ``Reduce``, but this behavior is not
guaranteed.

.. autoclass:: numba.cuda.Reduce
   :members: __init__, __call__
   :member-order: bysource