.. Copyright (c) 2017 Intel Corporation SPDX-License-Identifier: BSD-2-Clause .. _arch-stencil: ================= Notes on stencils ================= Numba provides the :ref:`@stencil decorator ` to represent stencil computations. This document explains how this feature is implemented in the several different modes available in Numba. Currently, calls to the stencil from non-jitted code is supported as well as calls from jitted code, either with or without the :ref:`parallel=True ` option. The stencil decorator ===================== The stencil decorator itself just returns a ``StencilFunc`` object. This object encapsulates the original stencil kernel function as specified in the program and the options passed to the stencil decorator. Also of note is that after the first compilation of the stencil, the computed neighborhood of the stencil is stored in the ``StencilFunc`` object in the ``neighborhood`` attribute. Handling the three modes ======================== As mentioned above, Numba supports the calling of stencils from inside or outside a ``@jit`` compiled function, with or without the :ref:`parallel=True ` option. Outside jit context ------------------- ``StencilFunc`` overrides the ``__call__`` method so that calls to ``StencilFunc`` objects execute the stencil:: def __call__(self, *args, **kwargs): result = kwargs.get('out') new_stencil_func = self._stencil_wrapper(result, None, *args) if result is None: return new_stencil_func.entry_point(*args) else: return new_stencil_func.entry_point(*args, result) First, the presence of the optional :ref:`out ` parameter is checked. If it is present then the output array is stored in ``result``. Then, the call to ``_stencil_wrapper`` generates the stencil function given the result and argument types and finally the generated stencil function is executed and its result returned. Jit without ``parallel=True`` ----------------------------- When constructed, a ``StencilFunc`` inserts itself into the typing context's set of user functions and provides the ``_type_me`` callback. In this way, the standard Numba compiler is able to determine the output type and signature of a ``StencilFunc``. Each ``StencilFunc`` maintains a cache of previously seen combinations of input argument types and keyword types. If previously seen, the ``StencilFunc`` returns the computed signature. If not previously computed, the ``StencilFunc`` computes the return type of the stencil by running the Numba compiler frontend on the stencil kernel and then performing type inference on the :term:`Numba IR` (IR) to get the scalar return type of the kernel. From that, a Numpy array type is constructed whose element type matches that scalar return type. After computing the signature of the stencil for a previously unseen combination of input and keyword types, the ``StencilFunc`` then :ref:`creates the stencil function ` itself. ``StencilFunc`` then installs the new stencil function's definition in the target context so that jitted code is able to call it. Thus, in this mode, the generated stencil function is a stand-alone function called like a normal function from within jitted code. Jit with ``parallel=True`` -------------------------- When calling a ``StencilFunc`` from a jitted context with ``parallel=True``, a separate stencil function as generated by :ref:`arch-stencil-create-function` is not used. Instead, `parfors` (:ref:`parallel-accelerator`) are created within the current function that implements the stencil. This code again starts with the stencil kernel and does a similar kernel size computation but then rather than standard Python looping syntax, corresponding `parfors` are created so that the execution of the stencil will take place in parallel. The stencil to `parfor` translations can also be selectively disabled by setting ``parallel={'stencil': False}``, among other sub-options described in :ref:`parallel-accelerator`. .. _arch-stencil-create-function: Creating the stencil function ============================= Conceptually, a stencil function is created from the user-specified stencil kernel by adding looping code around the kernel, transforming the relative kernel indices into absolute array indices based on the loop indices, and replacing the kernel's ``return`` statement with a statement to assign the computed value into the output array. To accomplish this transformation, first, a copy of the stencil kernel IR is created so that subsequent modifications of the IR for different stencil signatures will not effect each other. Then, an approach similar to how GUFunc's are created for `parfors` is employed. In a text buffer, a Python function is created with a unique name. The input array parameter is added to the function definition and if the ``out`` argument type is present then an ``out`` parameter is added to the stencil function definition. If the ``out`` argument is not present then first an output array is created with ``numpy.zeros`` having the same shape as the input array. The kernel is then analyzed to compute the stencil size and the shape of the boundary (or the ``neighborhood`` stencil decorator argument is used for this purpose if present). Then, one ``for`` loop for each dimension of the input array is added to the stencil function definition. The range of each loop is controlled by the stencil kernel size previously computed so that the boundary of the output image is not modified but instead left as is. The body of the innermost ``for`` loop is a single ``sentinel`` statement that is easily recognized in the IR. A call to ``exec`` with the text buffer is used to force the stencil function into existence and an ``eval`` is used to get access to the corresponding function on which ``run_frontend`` is used to get the stencil function IR. Various renaming and relabeling is performed on the stencil function IR and the kernel IR so that the two can be combined without conflict. The relative indices in the kernel IR (i.e., ``getitem`` calls) are replaced with expressions where the corresponding loop index variables are added to the relative indices. The ``return`` statement in the kernel IR is replaced with a ``setitem`` for the corresponding element in the output array. The stencil function IR is then scanned for the sentinel and the sentinel replaced with the modified kernel IR. Next, ``compile_ir`` is used to compile the combined stencil function IR. The resulting compile result is cached in the ``StencilFunc`` so that other calls to the same stencil do not need to undertake this process again. Exceptions raised ================= Various checks are performed during stencil compilation to make sure that user-specified options do not conflict with each other or with other runtime parameters. For example, if the user has manually specified a ``neighborhood`` to the stencil decorator, the length of that neighborhood must match the dimensionality of the input array. If this is not the case, a ``ValueError`` is raised. If the neighborhood has not been specified then it must be inferred and a requirement to infer the kernel is that all indices are constant integers. If they are not, a ``ValueError`` is raised indicating that kernel indices may not be non-constant. Finally, the stencil implementation detects the output array type by running Numba type inference on the stencil kernel. If the return type of this kernel does not match the type of the value passed to the ``cval`` stencil decorator option then a ``ValueError`` is raised.