First Steps with numba

import numba
print(numba.__version__)
0.12.0

Introduction to numba

Numba allows the compilation of selected portions of Python code to native code, using llvm as its backend. This allows the selected functions to execute at a speed competitive with code generated by C compilers.

It works at the function level. We can take a function, generate native code for that function as well as the wrapper code needed to call it directly from Python. This compilation is done on-the-fly and in-memory.

In this notebook I will illustrate some very simple usage of numba.

A simple example

Let’s start with a simple, yet time consuming function: a Python implementation of bubblesort. This bubblesort implementation works on a NumPy array.

def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

Now, let’s try the function, this way we check that it works. First we’ll create an array of sorted values and randomly shuffle them:

import numpy as np

original = np.arange(0.0, 10.0, 0.01, dtype='f4')
shuffled = original.copy()
np.random.shuffle(shuffled)

Now we’ll create a copy and do our bubble sort on the copy:

sorted = shuffled.copy()
bubblesort(sorted)
print(np.array_equal(sorted, original))
True

Let’s see how it behaves in execution time:

sorted[:] = shuffled[:]
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)
1 loops, best of 3: 328 ms per loop

Note that as execution time may depend on its input and the function itself is destructive, I make sure to use the same input in all the timings, by copying the original shuffled array into the new one. %timeit makes several runs and takes the best result, if the copy wasn’t done inside the timing code the vector would only be unsorted in the first iteration. As bubblesort works better on vectors that are already sorted, the next runs would be selected and we will get the time when running bubblesort in an already sorted array. In our case the copy time is minimal, though:

%timeit sorted[:] = shuffled[:]
1000000 loops, best of 3: 1.17 µs per loop

Compiling a function with numba.jit using an explicit function signature

Let’s get a numba version of this code running. One way to compile a function is by using the numba.jit decorator with an explicit signature. Later, we will see that we can get by without providing such a signature by letting numba figure out the signatures by itself. However, it is useful to know what the signature is, and what role it has in numba.

First, let’s start by peeking at the numba.jit string-doc:

print(numba.jit.__doc__)
jit([signature_or_function, [locals={}, [target='cpu',
            [**targetoptions]]]])

    The function can be used as the following versions:

    1) jit(signature, [target='cpu', [**targetoptions]]) -> jit(function)

        Equivalent to:

            d = dispatcher(function, targetoptions)
            d.compile(signature)

        Create a dispatcher object for a python function and default
        target-options.  Then, compile the funciton with the given signature.

        Example:

            @jit("void(int32, float32)")
            def foo(x, y):
                return x + y

    2) jit(function) -> dispatcher

        Same as old autojit.  Create a dispatcher function object that
        specialize at call site.

        Example:

            @jit
            def foo(x, y):
                return x + y

    3) jit([target='cpu', [**targetoptions]]) -> configured_jit(function)

        Same as old autojit and 2).  But configure with target and default
        target-options.


        Example:

            @jit(target='cpu', nopython=True)
            def foo(x, y):
                return x + y

    Target Options
    ---------------
    The CPU (default target) defines the following:

        - nopython: [bool]

            Set to True to disable the use of PyObjects and Python API
            calls.  The default behavior is to allow the use of PyObjects and
            Python API.  Default value is False.

        - forceobj: [bool]

            Set to True to force the use of PyObjects for every value.  Default
            value is False.

So let’s make a compiled version of our bubblesort:

bubblesort_jit = numba.jit("void(f4[:])")(bubblesort)

At this point, bubblesort_jit contains the compiled function -wrapped so that is directly callable from Python- generated from the original bubblesort function. Note that there is a fancy parameter “void(f4[:])” that is passed. That parameter describes the signature of the function to generate (more on this later).

Let’s check that it works:

sorted[:] = shuffled[:] # reset to shuffled before sorting
bubblesort_jit(sorted)
print(np.array_equal(sorted, original))
True

Now let’s compare the time it takes to execute the compiled function compared to the original

%timeit sorted[:] = shuffled[:]; bubblesort_jit(sorted)
1000 loops, best of 3: 1.25 ms per loop
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)
1 loops, best of 3: 323 ms per loop

Bear in mind that numba.jit is a decorator, although for practical reasons in this tutorial we will be calling it like a function to have access to both, the original function and the jitted one. In many practical uses, the decorator syntax may be more appropriate. With the decorator syntax our sample will look like this:

@numba.jit("void(f4[:])")
def bubblesort_jit(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

Signature

In order to generate fast code, the compiler needs type information for the code. This allows a direct mapping from the Python operations to the appropriate machine instruction without any type check/dispatch mechanism. In numba, in most cases it suffices to specify the types for the parameters. In many cases, numba can deduce types for intermediate values as well as the return value using type inference. For convenience, it is also possible to specify in the signature the type of the return value

A numba.jit compiled function will only work when called with the right type of arguments (it may, however, perform some conversions on types that it considers equivalent).

A signature contains the return type as well as the argument types. One way to specify the signature is using a string, like in our example. The signature takes the form: <return type> ( <arg1 type>, <arg2 type>, ... ). The types may be scalars or arrays (NumPy arrays). In our example, void(f4[:]), it means a function with no return (return type is void) that takes as unique argument an one-dimensional array of 4 byte floats f4[:]. Starting with numba version 0.12 the result type is optional. In that case the signature will look like the following: <arg1 type>, <arg2 type>, .... When the signature doesn’t provide a type for the return value, the type is inferred.

One way to specify the signature is by using such a string, the type for each argument being based on NumPy dtype strings for base types. Array types are also supported by using [:] type notation, where [:] is a one-dimensional strided array, [::1] is a one-dimensional contiguous array, [:,:] a bidimensional strided array, [:,:,:] a tridimiensional array, and so on. There are other ways to build the signature, you can find more details on signatures in its documentation page.

Some sample signatures follow:

signature meaning
void(f4[:], u8) a function with no return value taking a one-dimensional array of single precision floats and a 64-bit unsigned integer.
i4(f8) a function returning a 32-bit signed integer taking a double precision float as argument.
void(f4[:,:],f4[:,:]) a function with no return value taking two 2-dimensional arrays as arguments.

For a more in-depth explanation on supported types you can take a look at the “Numba types” notebook tutorial.

Compiling a function without providing a function signature (autojit functionality)

Starting with numba version 0.12, it is possible to use numba.jit without providing a type-signature for the function. This functionality was provided by numba.autojit in previous versions of numba. The old numba.autojit hass been deprecated in favour of this signature-less version of numba.jit.

When no type-signature is provided, the decorator returns wrapper code that will automatically create and run a numba compiled version when called. When called, resulting function will infer the types of the arguments being used. That information will be used to generated the signature to be used when compiling. The resulting compiled function will be called with the provided arguments.

For performance reasons, functions are cached so that code is only compiled once for a given signature. It is possible to call the function with different signatures, in that case, different native code will be generated and the right version will be chosen based on the argument types.

For most uses, using jit without a signature will be the simplest option.

bubblesort_autojit = numba.jit(bubblesort)
%timeit sorted[:] = shuffled[:]; bubblesort_autojit(sorted)
1000 loops, best of 3: 1.25 ms per loop

Some extra remarks

There is no magic, there are several details that is good to know about numba.

First, compiling takes time. Luckily enough it will not be a lot of time, specially for small functions. But when compiling many functions with many specializations the time may add up. Numba tries to do its best by caching compilation as much as possible though, so no time is spent in spurious compilation. It does its best to be lazy regarding compilation, this allows not paying the compilation time for code that is not used.

Second, not all code is compiled equal. There will be code that numba compiles down to an efficient native function. Sometimes the code generated has to fallback to the Python object system and its dispatch semantics. Other code may not compile at all.

When targeting the “cpu” target (the default), numba will either generate:

  • Fast native code -also called nopython mode-. The compiler was able to infer all the types in the function, so it can translate the code to a fast native routine without making use of the Python runtime.
  • Native code with calls to the Python run-time -also called object mode-. The compiler was not able to infer all the types, so that at some point a value was typed as a generic ‘object’. This means the full native version can’t be used. Instead, numba generates code using the Python run-time that should be faster than actual interpretation but quite far from what you could expect from a full native function.

By default, the ‘cpu’ target tries to compile the function in ‘nopython’ mode. If this fails, it tries again in object mode.

This example shows how falling back to Python objects may cause a slowdown in the generated code:

@numba.jit("void(i1[:])")
def test(value):
    for i in xrange(len(value)):
        value[i] = i % 100

from decimal import Decimal
@numba.jit("void(i1[:])")
def test2(value):
    for i in xrange(len(value)):
        value[i] = i % Decimal(100)

res = np.zeros((10000,), dtype="i1")
%timeit test(res)
10000 loops, best of 3: 31.9 µs per loop
%timeit test2(res)
1 loops, best of 3: 283 ms per loop

It is possible to force a failure if the nopython code generation fails. This allows getting some feedback about whether it is possible to generate code for a given function that doesn’t rely on the Python run-time. This can help when trying to write fast code, as object mode can have a huge performance penalty.

@numba.jit("void(i1[:])", nopython=True)
def test(value):
    for i in xrange(len(value)):
        value[i] = i % 100

On the other hand, test2 fails if we pass the nopython keyword:

@numba.jit("void(i1[:])", nopython=True)
def test2(value):
    for i in xrange(len(value)):
        value[i] = i % Decimal(100)
---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)

<ipython-input-19-6038b783c49c> in <module>()
----> 1 @numba.jit("void(i1[:])", nopython=True)
      2 def test2(value):
      3     for i in xrange(len(value)):
      4         value[i] = i % Decimal(100)


/Users/jayvius/Projects/numba/numba/decorators.pyc in wrapper(func)
    125         disp = dispatcher(py_func=func,  locals=locals,
    126                           targetoptions=targetoptions)
--> 127         disp.compile(sig)
    128         disp.disable_compile()
    129         return disp


/Users/jayvius/Projects/numba/numba/dispatcher.pyc in compile(self, sig, locals, **targetoptions)
    107             cres = compiler.compile_extra(typingctx, targetctx, self.py_func,
    108                                           args=args, return_type=return_type,
--> 109                                           flags=flags, locals=locs)
    110
    111             # Check typing error if object mode is used


/Users/jayvius/Projects/numba/numba/compiler.pyc in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals)
     77                                                                    args,
     78                                                                    return_type,
---> 79                                                                    locals)
     80         except Exception as e:
     81             if not flags.enable_pyobject:


/Users/jayvius/Projects/numba/numba/compiler.pyc in type_inference_stage(typingctx, interp, args, return_type, locals)
    156         infer.seed_type(k, v)
    157
--> 158     infer.build_constrain()
    159     infer.propagate()
    160     typemap, restype, calltypes = infer.unify()


/Users/jayvius/Projects/numba/numba/typeinfer.pyc in build_constrain(self)
    271         for blk in utils.dict_itervalues(self.blocks):
    272             for inst in blk.body:
--> 273                 self.constrain_statement(inst)
    274
    275     def propagate(self):


/Users/jayvius/Projects/numba/numba/typeinfer.pyc in constrain_statement(self, inst)
    368     def constrain_statement(self, inst):
    369         if isinstance(inst, ir.Assign):
--> 370             self.typeof_assign(inst)
    371         elif isinstance(inst, ir.SetItem):
    372             self.typeof_setitem(inst)


/Users/jayvius/Projects/numba/numba/typeinfer.pyc in typeof_assign(self, inst)
    390                                              src=value.name, loc=inst.loc))
    391         elif isinstance(value, ir.Global):
--> 392             self.typeof_global(inst, inst.target, value)
    393         elif isinstance(value, ir.Expr):
    394             self.typeof_expr(inst, inst.target, value)


/Users/jayvius/Projects/numba/numba/typeinfer.pyc in typeof_global(self, inst, target, gvar)
    470             except KeyError:
    471                 raise TypingError("Untyped global name '%s'" % gvar.name,
--> 472                                   loc=inst.loc)
    473             self.assumed_immutables.add(inst)
    474             self.typevars[target.name].lock(gvty)


TypingError: Untyped global name 'Decimal'
File "<ipython-input-19-6038b783c49c>", line 4