1.11. Using the @stencil decorator

Stencils are a common computational pattern in which array elements are updated according to some fixed pattern called the stencil kernel. Numba provides the @stencil decorator so that users may easily specify a stencil kernel and Numba then generates the looping code necessary to apply that kernel to some input array. Thus, the stencil decorator allows clearer, more concise code and in conjunction with the parallel jit option enables higher performance through parallelization of the stencil execution.

1.11.1. Basic usage

An example use of the @stencil decorator:

from numba import stencil

@stencil
def kernel1(a):
    return 0.25 * (a[0, 1] + a[1, 0] + a[0, -1] + a[-1, 0])

The stencil kernel is specified by what looks like a standard Python function definition but there are different semantics with respect to array indexing. Stencils produce an output array of the same size and shape as the input array although depending on the kernel definition may have a different type. Conceptually, the stencil kernel is run once for each element in the output array. The return value from the stencil kernel is the value written into the output array for that particular element.

The parameter a represents the input array over which the kernel is applied. Indexing into this array takes place with respect to the current element of the output array being processed. For example, if element (x, y) is being processed then a[0, 0] in the stencil kernel corresponds to a[x + 0, y + 0] in the input array. Similarly, a[-1, 1] in the stencil kernel corresponds to a[x - 1, y + 1] in the input array.

Depending on the specified kernel, the kernel may not be applicable to the borders of the output array as this may cause the input array to be accessed out-of-bounds. The way in which the stencil decorator handles this situation is dependent upon which func_or_mode is selected. The default mode is for the stencil decorator to set the border elements of the output array to zero.

To invoke a stencil on an input array, call the stencil as if it were a regular function and pass the input array as the argument. For example, using the kernel defined above:

>>> import numpy as np
>>> input_arr = np.arange(100).reshape((10, 10))
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
>>> output_arr = kernel1(input_arr)
array([[  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.],
       [  0.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,   0.],
       [  0.,  21.,  22.,  23.,  24.,  25.,  26.,  27.,  28.,   0.],
       [  0.,  31.,  32.,  33.,  34.,  35.,  36.,  37.,  38.,   0.],
       [  0.,  41.,  42.,  43.,  44.,  45.,  46.,  47.,  48.,   0.],
       [  0.,  51.,  52.,  53.,  54.,  55.,  56.,  57.,  58.,   0.],
       [  0.,  61.,  62.,  63.,  64.,  65.,  66.,  67.,  68.,   0.],
       [  0.,  71.,  72.,  73.,  74.,  75.,  76.,  77.,  78.,   0.],
       [  0.,  81.,  82.,  83.,  84.,  85.,  86.,  87.,  88.,   0.],
       [  0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.,   0.]])
>>> input_arr.dtype
dtype('int64')
>>> output_arr.dtype
dtype('float64')

Note that the stencil decorator has determined that the output type of the specified stencil kernel is float64 and has thus created the output array as float64 while the input array is of type int64.

1.11.2. Stencil Parameters

Stencil kernel definitions may take any number of arguments with the following provisions. The first argument must be an array. The size and shape of the output array will be the same as that of the first argument. Additional arguments may either be scalars or arrays. For array arguments, those arrays must be at least as large as the first argument (array) in each dimension. Array indexing is relative for all such input array arguments.

1.11.3. Kernel shape inference and border handling

In the above example and in most cases, the array indexing in the stencil kernel will exclusively use Integer literals. In such cases, the stencil decorator is able to analyze the stencil kernel to determine its size. In the above example, the stencil decorator determines that the kernel is 3 x 3 in shape since indices -1 to 1 are used for both the first and second dimensions. Note that the stencil decorator also correctly handles non-symmetric and non-square stencil kernels.

Based on the size of the stencil kernel, the stencil decorator is able to compute the size of the border in the output array. If applying the kernel to some element of input array would cause an index to be out-of-bounds then that element belongs to the border of the output array. In the above example, points -1 and +1 are accessed in each dimension and thus the output array has a border of size one in all dimensions.

The parallel mode is able to infer kernel indices as constants from simple expressions if possible. For example:

@njit(parallel=True)
def stencil_test(A):
    c = 2
    B = stencil(
        lambda a, c: 0.3 * (a[-c+1] + a[0] + a[c-1]))(A, c)
    return B

1.11.4. Stencil decorator options

Note

The stencil decorator may be augmented in the future to provide additional mechanisms for border handling. At present, only one behaviour is implemented, "constant" (see func_or_mode below for details).

1.11.4.1. neighborhood

Sometimes it may be inconvenient to write the stencil kernel exclusively with Integer literals. For example, let us say we would like to compute the trailing 30-day moving average of a time series of data. One could write (a[-29] + a[-28] + ... + a[-1] + a[0]) / 30 but the stencil decorator offers a more concise form using the neighborhood option:

@stencil(neighborhood = ((-29, 0),))
def kernel2(a):
    cumul = 0
    for i in range(-29, 1):
        cumul += a[i]
    return cumul / 30

The neighborhood option is a tuple of tuples. The outer tuple’s length is equal to the number of dimensions of the input array. The inner tuple’s lengths are always two because each element of the outer tuple corresponds to minimum and maximum index offsets used in the corresponding dimension.

If a user specifies a neighborhood but the kernel accesses elements outside the specified neighborhood, the behavior is undefined.

1.11.4.2. func_or_mode

The optional func_or_mode parameter controls how the border of the output array is handled. Currently, there is only one supported value, "constant". In constant mode, the stencil kernel is not applied in cases where the kernel would access elements outside the valid range of the input array. In such cases, those elements in the output array are assigned to a constant value, as specified by the cval parameter.

1.11.4.3. cval

The optional cval parameter defaults to zero but can be set to any desired value, which is then used for the border of the output array if the func_or_mode parameter is set to constant. The cval parameter is ignored in all other modes. The type of the cval parameter must match the return type of the stencil kernel. If the user wishes the output array to be constructed from a particular type then they should ensure that the stencil kernel returns that type.

1.11.4.4. standard_indexing

By default, all array accesses in a stencil kernel are processed as relative indices as described above. However, sometimes it may be advantageous to pass an auxiliary array (e.g. an array of weights) to a stencil kernel and have that array use standard Python indexing rather than relative indexing. For this purpose, there is the stencil decorator option standard_indexing whose value is a collection of strings whose names match those parameters to the stencil function that are to be accessed with standard Python indexing rather than relative indexing:

@stencil(standard_indexing=("b",))
def kernel3(a, b):
    return a[-1] * b[0] + a[0] + b[1]

1.11.5. StencilFunc

The stencil decorator returns a callable object of type StencilFunc. StencilFunc objects contains a number of attributes but the only one of potential interest to users is the neighborhood attribute. If the neighborhood option was passed to the stencil decorator then the provided neighborhood is stored in this attribute. Else, upon first execution or compilation, the system calculates the neighborhood as described above and then stores the computed neighborhood into this attribute. A user may then inspect the attribute if they wish to verify that the calculated neighborhood is correct.

1.11.6. Stencil invocation options

Internally, the stencil decorator transforms the specified stencil kernel into a regular Python function. This function will have the same parameters as specified in the stencil kernel definition but will also include the following optional parameter.

1.11.6.1. out

The optional out parameter is added to every stencil function generated by Numba. If specified, the out parameter tells Numba that the user is providing their own pre-allocated array to be used for the output of the stencil. In this case, the stencil function will not allocate its own output array. Users should assure that the return type of the stencil kernel can be safely cast to the element-type of the user-specified output array following the Numpy ufunc casting rules.

An example usage is shown below:

>>> import numpy as np
>>> input_arr = np.arange(100).reshape((10, 10))
>>> output_arr = np.full(input_arr.shape, 0.0)
>>> kernel1(input_arr, out=output_arr)