First Steps with numba
======================


.. code:: python

    import numba
    print(numba.__version__)

.. parsed-literal::

    0.12.0


Introduction to numba
---------------------


Numba allows the compilation of selected portions of Python code to
native code, using llvm as its backend. This allows the selected
functions to execute at a speed competitive with code generated by C
compilers.

It works at the function level. We can take a function, generate native
code for that function as well as the wrapper code needed to call it
directly from Python. This compilation is done on-the-fly and in-memory.

In this notebook I will illustrate some very simple usage of numba.

A simple example
----------------


Let's start with a simple, yet time consuming function: a Python
implementation of bubblesort. This bubblesort implementation works on a
NumPy array.

.. code:: python

    def bubblesort(X):
        N = len(X)
        for end in range(N, 1, -1):
            for i in range(end - 1):
                cur = X[i]
                if cur > X[i + 1]:
                    tmp = X[i]
                    X[i] = X[i + 1]
                    X[i + 1] = tmp

Now, let's try the function, this way we check that it works. First
we'll create an array of sorted values and randomly shuffle them:

.. code:: python

    import numpy as np
    
    original = np.arange(0.0, 10.0, 0.01, dtype='f4')
    shuffled = original.copy()
    np.random.shuffle(shuffled)

Now we'll create a copy and do our bubble sort on the copy:

.. code:: python

    sorted = shuffled.copy()
    bubblesort(sorted)
    print(np.array_equal(sorted, original))

.. parsed-literal::

    True


Let's see how it behaves in execution time:

.. code:: python

    sorted[:] = shuffled[:]
    %timeit sorted[:] = shuffled[:]; bubblesort(sorted)

.. parsed-literal::

    1 loops, best of 3: 328 ms per loop


Note that as execution time may depend on its input and the function
itself is destructive, I make sure to use the same input in all the
timings, by copying the original shuffled array into the new one.
%timeit makes several runs and takes the best result, if the copy wasn't
done inside the timing code the vector would only be unsorted in the
first iteration. As bubblesort works better on vectors that are already
sorted, the next runs would be selected and we will get the time when
running bubblesort in an already sorted array. In our case the copy time
is minimal, though:

.. code:: python

    %timeit sorted[:] = shuffled[:]

.. parsed-literal::

    1000000 loops, best of 3: 1.17 µs per loop


Compiling a function with numba.jit using an explicit function signature
------------------------------------------------------------------------


Let's get a numba version of this code running. One way to compile a
function is by using the *numba.jit* decorator with an explicit
signature. Later, we will see that we can get by without providing such
a *signature* by letting *numba* figure out the *signatures* by itself.
However, it is useful to know what the signature is, and what role it
has in *numba*.

First, let's start by peeking at the numba.jit string-doc:

.. code:: python

    print(numba.jit.__doc__)

::

    jit([signature_or_function, [locals={}, [target='cpu',
                [**targetoptions]]]])
    
        The function can be used as the following versions:
    
        1) jit(signature, [target='cpu', [**targetoptions]]) -> jit(function)
    
            Equivalent to:
    
                d = dispatcher(function, targetoptions)
                d.compile(signature)
    
            Create a dispatcher object for a python function and default
            target-options.  Then, compile the funciton with the given signature.
    
            Example:
    
                @jit("void(int32, float32)")
                def foo(x, y):
                    return x + y
    
        2) jit(function) -> dispatcher
    
            Same as old autojit.  Create a dispatcher function object that
            specialize at call site.
    
            Example:
    
                @jit
                def foo(x, y):
                    return x + y
    
        3) jit([target='cpu', [**targetoptions]]) -> configured_jit(function)
    
            Same as old autojit and 2).  But configure with target and default
            target-options.
    
    
            Example:
    
                @jit(target='cpu', nopython=True)
                def foo(x, y):
                    return x + y
    
        Target Options
        ---------------
        The CPU (default target) defines the following:
    
            - nopython: [bool]
    
                Set to True to disable the use of PyObjects and Python API
                calls.  The default behavior is to allow the use of PyObjects and
                Python API.  Default value is False.
    
            - forceobj: [bool]
    
                Set to True to force the use of PyObjects for every value.  Default
                value is False.
    
        


So let's make a compiled version of our bubblesort:

.. code:: python

    bubblesort_jit = numba.jit("void(f4[:])")(bubblesort)

At this point, **bubblesort\_jit** contains the compiled function
-wrapped so that is directly callable from Python- generated from the
original bubblesort function. Note that there is a fancy parameter
*"void(f4[:])"* that is passed. That parameter describes the *signature*
of the function to generate (more on this later).

Let's check that it works:

.. code:: python

    sorted[:] = shuffled[:] # reset to shuffled before sorting
    bubblesort_jit(sorted)
    print(np.array_equal(sorted, original))

.. parsed-literal::

    True


Now let's compare the time it takes to execute the compiled function
compared to the original

.. code:: python

    %timeit sorted[:] = shuffled[:]; bubblesort_jit(sorted)

.. parsed-literal::

    1000 loops, best of 3: 1.25 ms per loop


.. code:: python

    %timeit sorted[:] = shuffled[:]; bubblesort(sorted)

.. parsed-literal::

    1 loops, best of 3: 323 ms per loop


Bear in mind that numba.jit is a decorator, although for practical
reasons in this tutorial we will be calling it like a function to have
access to both, the original function and the jitted one. In many
practical uses, the decorator syntax may be more appropriate. With the
decorator syntax our sample will look like this:

.. code:: python

    @numba.jit("void(f4[:])")
    def bubblesort_jit(X):
        N = len(X)
        for end in range(N, 1, -1):
            for i in range(end - 1):
                cur = X[i]
                if cur > X[i + 1]:
                    tmp = X[i]
                    X[i] = X[i + 1]
                    X[i + 1] = tmp

Signature
---------


In order to generate fast code, the compiler needs type information for
the code. This allows a direct mapping from the Python operations to the
appropriate machine instruction without any type check/dispatch
mechanism. In numba, in most cases it suffices to specify the types for
the parameters. In many cases, numba can deduce types for intermediate
values as well as the return value using *type inference*. For
convenience, it is also possible to specify in the signature the type of
the *return value*

A *numba.jit* compiled function will only work when called with the
right type of arguments (it may, however, perform some conversions on
types that it considers equivalent).

A *signature* contains the return type as well as the argument types.
One way to specify the signature is using a string, like in our example.
The *signature* takes the form:
``<return type> ( <arg1 type>, <arg2 type>, ... )``. The types may be
scalars or arrays (NumPy arrays). In our example, ``void(f4[:])``, it
means a function with no return (return type is ``void``) that takes as
unique argument an one-dimensional array of 4 byte floats ``f4[:]``.
Starting with numba version 0.12 the result type is optional. In that
case the signature will look like the following:
``<arg1 type>, <arg2 type>, ...``. When the signature doesn't provide a
type for the return value, the type is *inferred*.

One way to specify the signature is by using such a string, the type for
each argument being based on NumPy dtype strings for base types. Array
types are also supported by using [:] type notation, where [:] is a
one-dimensional strided array, [::1] is a one-dimensional contiguous
array, [:,:] a bidimensional strided array, [:,:,:] a tridimiensional
array, and so on. There are other ways to build the signature, you can
find more details on signatures in its documentation page.

Some sample signatures follow:

+-----------------------------+----------------------------------------------------------------------------------------------------------------------------+
| signature                   | meaning                                                                                                                    |
+=============================+============================================================================================================================+
| ``void(f4[:], u8)``         | a function with no return value taking a one-dimensional array of single precision floats and a 64-bit unsigned integer.   |
+-----------------------------+----------------------------------------------------------------------------------------------------------------------------+
| ``i4(f8)``                  |  a function returning a 32-bit signed integer taking a double precision float as argument.                                 |
+-----------------------------+----------------------------------------------------------------------------------------------------------------------------+
| ``void(f4[:,:],f4[:,:])``   | a function with no return value taking two 2-dimensional arrays as arguments.                                              |
+-----------------------------+----------------------------------------------------------------------------------------------------------------------------+

For a more in-depth explanation on supported types you can take a look
at the "Numba types" notebook tutorial.

Compiling a function without providing a function signature (autojit functionality)
-----------------------------------------------------------------------------------


Starting with numba version 0.12, it is possible to use *numba.jit*
without providing a type-signature for the function. This functionality
was provided by *numba.autojit* in previous versions of *numba*. The old
*numba.autojit* hass been deprecated in favour of this signature-less
version of *numba.jit*.

When no *type-signature* is provided, the decorator returns wrapper code
that will automatically create and run a *numba* compiled version when
called. When called, resulting function will infer the types of the
arguments being used. That information will be used to generated the
*signature* to be used when compiling. The resulting compiled function
will be called with the provided arguments.

For performance reasons, functions are cached so that code is only
compiled once for a given signature. It is possible to call the function
with different signatures, in that case, different native code will be
generated and the right version will be chosen based on the argument
types.

For most uses, using jit without a signature will be the simplest
option.

.. code:: python

    bubblesort_autojit = numba.jit(bubblesort)
.. code:: python

    %timeit sorted[:] = shuffled[:]; bubblesort_autojit(sorted)

.. parsed-literal::

    1000 loops, best of 3: 1.25 ms per loop


Some extra remarks
------------------


There is no magic, there are several details that is good to know about
numba.

First, compiling takes time. Luckily enough it will not be a lot of
time, specially for small functions. But when compiling many functions
with many specializations the time may add up. Numba tries to do its
best by caching compilation as much as possible though, so no time is
spent in spurious compilation. It does its best to be *lazy* regarding
compilation, this allows not paying the compilation time for code that
is not used.

Second, not all code is compiled equal. There will be code that *numba*
compiles down to an efficient native function. Sometimes the code
generated has to fallback to the Python object system and its dispatch
semantics. Other code may not compile at all.

When targeting the "cpu" target (the default), *numba* will either
generate:

-  Fast native code -also called 'nopython'-. The compiler was able to
   infer all the types in the function, so it can translate the code to
   a fast native routine without making use of the Python runtime.

-  Native code with calls to the Python run-time -also called object
   mode-. The compiler was not able to infer all the types, so that at
   some point a value was typed as a generic 'object'. This means the
   full native version can't be used. Instead, numba generates code
   using the Python run-time that should be faster than actual
   interpretation but quite far from what you could expect from a full
   native function.

By default, the 'cpu' target tries to compile the function in 'nopython'
mode. If this fails, it tries again in object mode.

This example shows how falling back to Python objects may cause a
slowdown in the generated code:

.. code:: python

    @numba.jit("void(i1[:])")
    def test(value):
        for i in xrange(len(value)):
            value[i] = i % 100
    
    from decimal import Decimal
    @numba.jit("void(i1[:])")
    def test2(value):
        for i in xrange(len(value)):
            value[i] = i % Decimal(100)
    
    res = np.zeros((10000,), dtype="i1")
.. code:: python

    %timeit test(res)

.. parsed-literal::

    10000 loops, best of 3: 31.9 µs per loop


.. code:: python

    %timeit test2(res)

.. parsed-literal::

    1 loops, best of 3: 283 ms per loop


It is possible to force a failure if the *nopython* code generation
fails. This allows getting some feedback about whether it is possible to
generate code for a given function that doesn't rely on the Python
run-time. This can help when trying to write fast code, as object mode
can have a huge performance penalty.

.. code:: python

    @numba.jit("void(i1[:])", nopython=True)
    def test(value):
        for i in xrange(len(value)):
            value[i] = i % 100

On the other hand, *test2* fails if we pass the *nopython* keyword:

.. code:: python

    @numba.jit("void(i1[:])", nopython=True)
    def test2(value):
        for i in xrange(len(value)):
            value[i] = i % Decimal(100)


::


    ---------------------------------------------------------------------------
    TypingError                               Traceback (most recent call last)

    <ipython-input-19-6038b783c49c> in <module>()
    ----> 1 @numba.jit("void(i1[:])", nopython=True)
          2 def test2(value):
          3     for i in xrange(len(value)):
          4         value[i] = i % Decimal(100)


    /Users/jayvius/Projects/numba/numba/decorators.pyc in wrapper(func)
        125         disp = dispatcher(py_func=func,  locals=locals,
        126                           targetoptions=targetoptions)
    --> 127         disp.compile(sig)
        128         disp.disable_compile()
        129         return disp


    /Users/jayvius/Projects/numba/numba/dispatcher.pyc in compile(self, sig, locals, **targetoptions)
        107             cres = compiler.compile_extra(typingctx, targetctx, self.py_func,
        108                                           args=args, return_type=return_type,
    --> 109                                           flags=flags, locals=locs)
        110 
        111             # Check typing error if object mode is used


    /Users/jayvius/Projects/numba/numba/compiler.pyc in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals)
         77                                                                    args,
         78                                                                    return_type,
    ---> 79                                                                    locals)
         80         except Exception as e:
         81             if not flags.enable_pyobject:


    /Users/jayvius/Projects/numba/numba/compiler.pyc in type_inference_stage(typingctx, interp, args, return_type, locals)
        156         infer.seed_type(k, v)
        157 
    --> 158     infer.build_constrain()
        159     infer.propagate()
        160     typemap, restype, calltypes = infer.unify()


    /Users/jayvius/Projects/numba/numba/typeinfer.pyc in build_constrain(self)
        271         for blk in utils.dict_itervalues(self.blocks):
        272             for inst in blk.body:
    --> 273                 self.constrain_statement(inst)
        274 
        275     def propagate(self):


    /Users/jayvius/Projects/numba/numba/typeinfer.pyc in constrain_statement(self, inst)
        368     def constrain_statement(self, inst):
        369         if isinstance(inst, ir.Assign):
    --> 370             self.typeof_assign(inst)
        371         elif isinstance(inst, ir.SetItem):
        372             self.typeof_setitem(inst)


    /Users/jayvius/Projects/numba/numba/typeinfer.pyc in typeof_assign(self, inst)
        390                                              src=value.name, loc=inst.loc))
        391         elif isinstance(value, ir.Global):
    --> 392             self.typeof_global(inst, inst.target, value)
        393         elif isinstance(value, ir.Expr):
        394             self.typeof_expr(inst, inst.target, value)


    /Users/jayvius/Projects/numba/numba/typeinfer.pyc in typeof_global(self, inst, target, gvar)
        470             except KeyError:
        471                 raise TypingError("Untyped global name '%s'" % gvar.name,
    --> 472                                   loc=inst.loc)
        473             self.assumed_immutables.add(inst)
        474             self.typevars[target.name].lock(gvty)


    TypingError: Untyped global name 'Decimal'
    File "<ipython-input-19-6038b783c49c>", line 4