Numba provides the @stencil decorator to represent stencil computations. This document explains how this feature is implemented in the several different modes available in Numba. Currently, calls to the stencil from non-jitted code is supported as well as calls from jitted code, either with or without the parallel=True option.
The stencil decorator itself just returns a StencilFunc
object.
This object encapsulates the original stencil kernel function
as specified in the program and the options passed to the
stencil decorator. Also of note is that after the first compilation
of the stencil, the computed neighborhood of the stencil is
stored in the StencilFunc
object in the neighborhood
attribute.
As mentioned above, Numba supports the calling of stencils
from inside or outside a @jit
compiled function, with or
without the parallel=True option.
StencilFunc
overrides the __call__
method so that calls
to StencilFunc
objects execute the stencil:
def __call__(self, *args, **kwargs):
result = kwargs.get('out')
new_stencil_func = self._stencil_wrapper(result, None, *args)
if result is None:
return new_stencil_func.entry_point(*args)
else:
return new_stencil_func.entry_point(*args, result)
First, the presence of the optional out
parameter is checked. If it is present then the output array is
stored in result
. Then, the call to _stencil_wrapper
generates the stencil function given the result and argument types
and finally the generated stencil function is executed and its result
returned.
parallel=True
¶When constructed, a StencilFunc
inserts itself into the typing
context’s set of user functions and provides the _type_me
callback. In this way, the standard Numba compiler is able to
determine the output type and signature of a StencilFunc
.
Each StencilFunc
maintains a cache of previously seen combinations
of input argument types and keyword types. If previously seen,
the StencilFunc
returns the computed signature. If not previously
computed, the StencilFunc
computes the return type of the stencil
by running the Numba compiler frontend on the stencil kernel and
then performing type inference on the Numba IR (IR) to get the scalar
return type of the kernel. From that, a Numpy array type is constructed
whose element type matches that scalar return type.
After computing the signature of the stencil for a previously
unseen combination of input and keyword types, the StencilFunc
then creates the stencil function itself.
StencilFunc
then installs the new stencil function’s definition
in the target context so that jitted code is able to call it.
Thus, in this mode, the generated stencil function is a stand-alone function called like a normal function from within jitted code.
parallel=True
¶When calling a StencilFunc
from a jitted context with parallel=True
,
a separate stencil function as generated by Creating the stencil function
is not used. Instead, parfors (Stage 6b: Perform Automatic Parallelization) are
created within the current function that implements the stencil.
This code again starts with the stencil kernel and does a similar kernel
size computation but then rather than standard Python looping syntax,
corresponding parfors are created so that the execution of the stencil
will take place in parallel.
The stencil to parfor translations can also be selectively disabled
by setting parallel={'stencil': False}
, among other sub-options
described in Stage 6b: Perform Automatic Parallelization.
Conceptually, a stencil function is created from the user-specified
stencil kernel by adding looping code around the kernel, transforming
the relative kernel indices into absolute array indices based on the
loop indices, and replacing the kernel’s return
statement with
a statement to assign the computed value into the output array.
To accomplish this transformation, first, a copy of the stencil kernel IR is created so that subsequent modifications of the IR for different stencil signatures will not effect each other.
Then, an approach similar to how GUFunc’s are created for parfors
is employed. In a text buffer, a Python function is created with
a unique name. The input array parameter is added to the function
definition and if the out
argument type is present then an
out
parameter is added to the stencil function definition.
If the out
argument is not present then first an output array
is created with numpy.zeros
having the same shape as the
input array.
The kernel is then analyzed to compute the stencil size and the
shape of the boundary (or the neighborhood
stencil decorator
argument is used for this purpose if present).
Then, one for
loop for each dimension of the input array is
added to the stencil function definition. The range of each
loop is controlled by the stencil kernel size previously computed
so that the boundary of the output image is not modified but instead
left as is. The body of the innermost for
loop is a single
sentinel
statement that is easily recognized in the IR.
A call to exec
with the text buffer is used to force the
stencil function into existence and an eval
is used to get
access to the corresponding function on which run_frontend
is
used to get the stencil function IR.
Various renaming and relabeling is performed on the stencil function
IR and the kernel IR so that the two can be combined without conflict.
The relative indices in the kernel IR (i.e., getitem
calls) are
replaced with expressions where the corresponding loop index variables
are added to the relative indices. The return
statement in the
kernel IR is replaced with a setitem
for the corresponding element
in the output array.
The stencil function IR is then scanned for the sentinel and the
sentinel replaced with the modified kernel IR.
Next, compile_ir
is used to compile the combined stencil function
IR. The resulting compile result is cached in the StencilFunc
so that
other calls to the same stencil do not need to undertake this process
again.
Various checks are performed during stencil compilation to make sure
that user-specified options do not conflict with each other or with
other runtime parameters. For example, if the user has manually
specified a neighborhood
to the stencil decorator, the length of
that neighborhood must match the dimensionality of the input array.
If this is not the case, a ValueError
is raised.
If the neighborhood has not been specified then it must be inferred
and a requirement to infer the kernel is that all indices are constant
integers. If they are not, a ValueError
is raised indicating that
kernel indices may not be non-constant.
Finally, the stencil implementation detects the output array type
by running Numba type inference on the stencil kernel. If the
return type of this kernel does not match the type of the value
passed to the cval
stencil decorator option then a ValueError
is raised.