.. Copyright (c) 2017 Intel Corporation
   SPDX-License-Identifier: BSD-2-Clause

.. _arch-stencil:

=================
Notes on stencils 
=================

Numba provides the :ref:`@stencil decorator <numba-stencil>` to
represent stencil computations.  This document explains how this
feature is implemented in the several different modes available in
Numba.  Currently, calls to the stencil from non-jitted code is
supported as well as calls from jitted code, either with or without
the :ref:`parallel=True <parallel_jit_option>` option.

The stencil decorator
=====================

The stencil decorator itself just returns a ``StencilFunc`` object.
This object encapsulates the original stencil kernel function
as specified in the program and the options passed to the
stencil decorator.  Also of note is that after the first compilation
of the stencil, the computed neighborhood of the stencil is
stored in the ``StencilFunc`` object in the ``neighborhood`` attribute.

Handling the three modes
========================

As mentioned above, Numba supports the calling of stencils
from inside or outside a ``@jit`` compiled function, with or
without the :ref:`parallel=True <parallel_jit_option>` option.

Outside jit context
-------------------

``StencilFunc`` overrides the ``__call__`` method so that calls
to ``StencilFunc`` objects execute the stencil::

    def __call__(self, *args, **kwargs):                                        
        result = kwargs.get('out')
                                                                                
        new_stencil_func = self._stencil_wrapper(result, None, *args)           
                                                                                
        if result is None:                                                      
            return new_stencil_func.entry_point(*args)                          
        else:                                                                   
            return new_stencil_func.entry_point(*args, result)                  

First, the presence of the optional :ref:`out <stencil-function-out>`
parameter is checked.  If it is present then the output array is
stored in ``result``.  Then, the call to ``_stencil_wrapper``
generates the stencil function given the result and argument types
and finally the generated stencil function is executed and its result
returned.

Jit without ``parallel=True``
-----------------------------

When constructed, a ``StencilFunc`` inserts itself into the typing
context's set of user functions and provides the ``_type_me``
callback.  In this way, the standard Numba compiler is able to
determine the output type and signature of a ``StencilFunc``.
Each ``StencilFunc`` maintains a cache of previously seen combinations
of input argument types and keyword types.  If previously seen,
the ``StencilFunc`` returns the computed signature.  If not previously
computed, the ``StencilFunc`` computes the return type of the stencil
by running the Numba compiler frontend on the stencil kernel and
then performing type inference on the :term:`Numba IR` (IR) to get the scalar
return type of the kernel.  From that, a Numpy array type is constructed
whose element type matches that scalar return type.

After computing the signature of the stencil for a previously
unseen combination of input and keyword types, the ``StencilFunc``
then :ref:`creates the stencil function <arch-stencil-create-function>` itself.
``StencilFunc`` then installs the new stencil function's definition
in the target context so that jitted code is able to call it.

Thus, in this mode, the generated stencil function is a stand-alone
function called like a normal function from within jitted code.

Jit with ``parallel=True``
--------------------------

When calling a ``StencilFunc`` from a jitted context with ``parallel=True``,
a separate stencil function as generated by :ref:`arch-stencil-create-function`
is not used.  Instead, `parfors` (:ref:`parallel-accelerator`) are
created within the current function that implements the stencil.
This code again starts with the stencil kernel and does a similar kernel
size computation but then rather than standard Python looping syntax,
corresponding `parfors` are created so that the execution of the stencil
will take place in parallel.

The stencil to `parfor` translations can also be selectively disabled
by setting ``parallel={'stencil': False}``, among other sub-options
described in :ref:`parallel-accelerator`.

.. _arch-stencil-create-function:

Creating the stencil function
=============================

Conceptually, a stencil function is created from the user-specified
stencil kernel by adding looping code around the kernel, transforming
the relative kernel indices into absolute array indices based on the
loop indices, and replacing the kernel's ``return`` statement with
a statement to assign the computed value into the output array.

To accomplish this transformation, first, a copy of the stencil 
kernel IR is created so that subsequent modifications of the IR
for different stencil signatures will not effect each other.

Then, an approach similar to how GUFunc's are created for `parfors`
is employed.  In a text buffer, a Python function is created with
a unique name.  The input array parameter is added to the function
definition and if the ``out`` argument type is present then an
``out`` parameter is added to the stencil function definition.
If the ``out`` argument is not present then first an output array
is created with ``numpy.zeros`` having the same shape as the
input array.

The kernel is then analyzed to compute the stencil size and the
shape of the boundary (or the ``neighborhood`` stencil decorator
argument is used for this purpose if present).
Then, one ``for`` loop for each dimension of the input array is
added to the stencil function definition.  The range of each
loop is controlled by the stencil kernel size previously computed
so that the boundary of the output image is not modified but instead
left as is.   The body of the innermost ``for`` loop is a single
``sentinel`` statement that is easily recognized in the IR.
A call to ``exec`` with the text buffer is used to force the
stencil function into existence and an ``eval`` is used to get
access to the corresponding function on which ``run_frontend`` is
used to get the stencil function IR.

Various renaming and relabeling is performed on the stencil function
IR and the kernel IR so that the two can be combined without conflict.
The relative indices in the kernel IR (i.e., ``getitem`` calls) are
replaced with expressions where the corresponding loop index variables
are added to the relative indices.  The ``return`` statement in the
kernel IR is replaced with a ``setitem`` for the corresponding element
in the output array.
The stencil function IR is then scanned for the sentinel and the
sentinel replaced with the modified kernel IR.

Next, ``compile_ir`` is used to compile the combined stencil function
IR.  The resulting compile result is cached in the ``StencilFunc`` so that
other calls to the same stencil do not need to undertake this process
again.

Exceptions raised
=================

Various checks are performed during stencil compilation to make sure
that user-specified options do not conflict with each other or with
other runtime parameters.  For example, if the user has manually
specified a ``neighborhood`` to the stencil decorator, the length of
that neighborhood must match the dimensionality of the input array.
If this is not the case, a ``ValueError`` is raised.

If the neighborhood has not been specified then it must be inferred
and a requirement to infer the kernel is that all indices are constant
integers.  If they are not, a ``ValueError`` is raised indicating that
kernel indices may not be non-constant.

Finally, the stencil implementation detects the output array type
by running Numba type inference on the stencil kernel.  If the
return type of this kernel does not match the type of the value
passed to the ``cval`` stencil decorator option then a ``ValueError``
is raised.