First Steps with numba ====================== .. code:: python import numba print(numba.__version__) .. parsed-literal:: 0.12.0 Introduction to numba --------------------- Numba allows the compilation of selected portions of Python code to native code, using llvm as its backend. This allows the selected functions to execute at a speed competitive with code generated by C compilers. It works at the function level. We can take a function, generate native code for that function as well as the wrapper code needed to call it directly from Python. This compilation is done on-the-fly and in-memory. In this notebook I will illustrate some very simple usage of numba. A simple example ---------------- Let's start with a simple, yet time consuming function: a Python implementation of bubblesort. This bubblesort implementation works on a NumPy array. .. code:: python def bubblesort(X): N = len(X) for end in range(N, 1, -1): for i in range(end - 1): cur = X[i] if cur > X[i + 1]: tmp = X[i] X[i] = X[i + 1] X[i + 1] = tmp Now, let's try the function, this way we check that it works. First we'll create an array of sorted values and randomly shuffle them: .. code:: python import numpy as np original = np.arange(0.0, 10.0, 0.01, dtype='f4') shuffled = original.copy() np.random.shuffle(shuffled) Now we'll create a copy and do our bubble sort on the copy: .. code:: python sorted = shuffled.copy() bubblesort(sorted) print(np.array_equal(sorted, original)) .. parsed-literal:: True Let's see how it behaves in execution time: .. code:: python sorted[:] = shuffled[:] %timeit sorted[:] = shuffled[:]; bubblesort(sorted) .. parsed-literal:: 1 loops, best of 3: 328 ms per loop Note that as execution time may depend on its input and the function itself is destructive, I make sure to use the same input in all the timings, by copying the original shuffled array into the new one. %timeit makes several runs and takes the best result, if the copy wasn't done inside the timing code the vector would only be unsorted in the first iteration. As bubblesort works better on vectors that are already sorted, the next runs would be selected and we will get the time when running bubblesort in an already sorted array. In our case the copy time is minimal, though: .. code:: python %timeit sorted[:] = shuffled[:] .. parsed-literal:: 1000000 loops, best of 3: 1.17 µs per loop Compiling a function with numba.jit using an explicit function signature ------------------------------------------------------------------------ Let's get a numba version of this code running. One way to compile a function is by using the *numba.jit* decorator with an explicit signature. Later, we will see that we can get by without providing such a *signature* by letting *numba* figure out the *signatures* by itself. However, it is useful to know what the signature is, and what role it has in *numba*. First, let's start by peeking at the numba.jit string-doc: .. code:: python print(numba.jit.__doc__) :: jit([signature_or_function, [locals={}, [target='cpu', [**targetoptions]]]]) The function can be used as the following versions: 1) jit(signature, [target='cpu', [**targetoptions]]) -> jit(function) Equivalent to: d = dispatcher(function, targetoptions) d.compile(signature) Create a dispatcher object for a python function and default target-options. Then, compile the funciton with the given signature. Example: @jit("void(int32, float32)") def foo(x, y): return x + y 2) jit(function) -> dispatcher Same as old autojit. Create a dispatcher function object that specialize at call site. Example: @jit def foo(x, y): return x + y 3) jit([target='cpu', [**targetoptions]]) -> configured_jit(function) Same as old autojit and 2). But configure with target and default target-options. Example: @jit(target='cpu', nopython=True) def foo(x, y): return x + y Target Options --------------- The CPU (default target) defines the following: - nopython: [bool] Set to True to disable the use of PyObjects and Python API calls. The default behavior is to allow the use of PyObjects and Python API. Default value is False. - forceobj: [bool] Set to True to force the use of PyObjects for every value. Default value is False. So let's make a compiled version of our bubblesort: .. code:: python bubblesort_jit = numba.jit("void(f4[:])")(bubblesort) At this point, **bubblesort\_jit** contains the compiled function -wrapped so that is directly callable from Python- generated from the original bubblesort function. Note that there is a fancy parameter *"void(f4[:])"* that is passed. That parameter describes the *signature* of the function to generate (more on this later). Let's check that it works: .. code:: python sorted[:] = shuffled[:] # reset to shuffled before sorting bubblesort_jit(sorted) print(np.array_equal(sorted, original)) .. parsed-literal:: True Now let's compare the time it takes to execute the compiled function compared to the original .. code:: python %timeit sorted[:] = shuffled[:]; bubblesort_jit(sorted) .. parsed-literal:: 1000 loops, best of 3: 1.25 ms per loop .. code:: python %timeit sorted[:] = shuffled[:]; bubblesort(sorted) .. parsed-literal:: 1 loops, best of 3: 323 ms per loop Bear in mind that numba.jit is a decorator, although for practical reasons in this tutorial we will be calling it like a function to have access to both, the original function and the jitted one. In many practical uses, the decorator syntax may be more appropriate. With the decorator syntax our sample will look like this: .. code:: python @numba.jit("void(f4[:])") def bubblesort_jit(X): N = len(X) for end in range(N, 1, -1): for i in range(end - 1): cur = X[i] if cur > X[i + 1]: tmp = X[i] X[i] = X[i + 1] X[i + 1] = tmp Signature --------- In order to generate fast code, the compiler needs type information for the code. This allows a direct mapping from the Python operations to the appropriate machine instruction without any type check/dispatch mechanism. In numba, in most cases it suffices to specify the types for the parameters. In many cases, numba can deduce types for intermediate values as well as the return value using *type inference*. For convenience, it is also possible to specify in the signature the type of the *return value* A *numba.jit* compiled function will only work when called with the right type of arguments (it may, however, perform some conversions on types that it considers equivalent). A *signature* contains the return type as well as the argument types. One way to specify the signature is using a string, like in our example. The *signature* takes the form: `` ( , , ... )``. The types may be scalars or arrays (NumPy arrays). In our example, ``void(f4[:])``, it means a function with no return (return type is ``void``) that takes as unique argument an one-dimensional array of 4 byte floats ``f4[:]``. Starting with numba version 0.12 the result type is optional. In that case the signature will look like the following: ``, , ...``. When the signature doesn't provide a type for the return value, the type is *inferred*. One way to specify the signature is by using such a string, the type for each argument being based on NumPy dtype strings for base types. Array types are also supported by using [:] type notation, where [:] is a one-dimensional strided array, [::1] is a one-dimensional contiguous array, [:,:] a bidimensional strided array, [:,:,:] a tridimiensional array, and so on. There are other ways to build the signature, you can find more details on signatures in its documentation page. Some sample signatures follow: +-----------------------------+----------------------------------------------------------------------------------------------------------------------------+ | signature | meaning | +=============================+============================================================================================================================+ | ``void(f4[:], u8)`` | a function with no return value taking a one-dimensional array of single precision floats and a 64-bit unsigned integer. | +-----------------------------+----------------------------------------------------------------------------------------------------------------------------+ | ``i4(f8)`` |  a function returning a 32-bit signed integer taking a double precision float as argument. | +-----------------------------+----------------------------------------------------------------------------------------------------------------------------+ | ``void(f4[:,:],f4[:,:])`` | a function with no return value taking two 2-dimensional arrays as arguments. | +-----------------------------+----------------------------------------------------------------------------------------------------------------------------+ For a more in-depth explanation on supported types you can take a look at the "Numba types" notebook tutorial. Compiling a function without providing a function signature (autojit functionality) ----------------------------------------------------------------------------------- Starting with numba version 0.12, it is possible to use *numba.jit* without providing a type-signature for the function. This functionality was provided by *numba.autojit* in previous versions of *numba*. The old *numba.autojit* hass been deprecated in favour of this signature-less version of *numba.jit*. When no *type-signature* is provided, the decorator returns wrapper code that will automatically create and run a *numba* compiled version when called. When called, resulting function will infer the types of the arguments being used. That information will be used to generated the *signature* to be used when compiling. The resulting compiled function will be called with the provided arguments. For performance reasons, functions are cached so that code is only compiled once for a given signature. It is possible to call the function with different signatures, in that case, different native code will be generated and the right version will be chosen based on the argument types. For most uses, using jit without a signature will be the simplest option. .. code:: python bubblesort_autojit = numba.jit(bubblesort) .. code:: python %timeit sorted[:] = shuffled[:]; bubblesort_autojit(sorted) .. parsed-literal:: 1000 loops, best of 3: 1.25 ms per loop Some extra remarks ------------------ There is no magic, there are several details that is good to know about numba. First, compiling takes time. Luckily enough it will not be a lot of time, specially for small functions. But when compiling many functions with many specializations the time may add up. Numba tries to do its best by caching compilation as much as possible though, so no time is spent in spurious compilation. It does its best to be *lazy* regarding compilation, this allows not paying the compilation time for code that is not used. Second, not all code is compiled equal. There will be code that *numba* compiles down to an efficient native function. Sometimes the code generated has to fallback to the Python object system and its dispatch semantics. Other code may not compile at all. When targeting the "cpu" target (the default), *numba* will either generate: - Fast native code -also called :term:`nopython mode`-. The compiler was able to infer all the types in the function, so it can translate the code to a fast native routine without making use of the Python runtime. - Native code with calls to the Python run-time -also called :term:`object mode`-. The compiler was not able to infer all the types, so that at some point a value was typed as a generic 'object'. This means the full native version can't be used. Instead, numba generates code using the Python run-time that should be faster than actual interpretation but quite far from what you could expect from a full native function. By default, the 'cpu' target tries to compile the function in 'nopython' mode. If this fails, it tries again in object mode. This example shows how falling back to Python objects may cause a slowdown in the generated code: .. code:: python @numba.jit("void(i1[:])") def test(value): for i in xrange(len(value)): value[i] = i % 100 from decimal import Decimal @numba.jit("void(i1[:])") def test2(value): for i in xrange(len(value)): value[i] = i % Decimal(100) res = np.zeros((10000,), dtype="i1") .. code:: python %timeit test(res) .. parsed-literal:: 10000 loops, best of 3: 31.9 µs per loop .. code:: python %timeit test2(res) .. parsed-literal:: 1 loops, best of 3: 283 ms per loop It is possible to force a failure if the *nopython* code generation fails. This allows getting some feedback about whether it is possible to generate code for a given function that doesn't rely on the Python run-time. This can help when trying to write fast code, as object mode can have a huge performance penalty. .. code:: python @numba.jit("void(i1[:])", nopython=True) def test(value): for i in xrange(len(value)): value[i] = i % 100 On the other hand, *test2* fails if we pass the *nopython* keyword: .. code:: python @numba.jit("void(i1[:])", nopython=True) def test2(value): for i in xrange(len(value)): value[i] = i % Decimal(100) :: --------------------------------------------------------------------------- TypingError Traceback (most recent call last) in () ----> 1 @numba.jit("void(i1[:])", nopython=True) 2 def test2(value): 3 for i in xrange(len(value)): 4 value[i] = i % Decimal(100) /Users/jayvius/Projects/numba/numba/decorators.pyc in wrapper(func) 125 disp = dispatcher(py_func=func, locals=locals, 126 targetoptions=targetoptions) --> 127 disp.compile(sig) 128 disp.disable_compile() 129 return disp /Users/jayvius/Projects/numba/numba/dispatcher.pyc in compile(self, sig, locals, **targetoptions) 107 cres = compiler.compile_extra(typingctx, targetctx, self.py_func, 108 args=args, return_type=return_type, --> 109 flags=flags, locals=locs) 110 111 # Check typing error if object mode is used /Users/jayvius/Projects/numba/numba/compiler.pyc in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals) 77 args, 78 return_type, ---> 79 locals) 80 except Exception as e: 81 if not flags.enable_pyobject: /Users/jayvius/Projects/numba/numba/compiler.pyc in type_inference_stage(typingctx, interp, args, return_type, locals) 156 infer.seed_type(k, v) 157 --> 158 infer.build_constrain() 159 infer.propagate() 160 typemap, restype, calltypes = infer.unify() /Users/jayvius/Projects/numba/numba/typeinfer.pyc in build_constrain(self) 271 for blk in utils.dict_itervalues(self.blocks): 272 for inst in blk.body: --> 273 self.constrain_statement(inst) 274 275 def propagate(self): /Users/jayvius/Projects/numba/numba/typeinfer.pyc in constrain_statement(self, inst) 368 def constrain_statement(self, inst): 369 if isinstance(inst, ir.Assign): --> 370 self.typeof_assign(inst) 371 elif isinstance(inst, ir.SetItem): 372 self.typeof_setitem(inst) /Users/jayvius/Projects/numba/numba/typeinfer.pyc in typeof_assign(self, inst) 390 src=value.name, loc=inst.loc)) 391 elif isinstance(value, ir.Global): --> 392 self.typeof_global(inst, inst.target, value) 393 elif isinstance(value, ir.Expr): 394 self.typeof_expr(inst, inst.target, value) /Users/jayvius/Projects/numba/numba/typeinfer.pyc in typeof_global(self, inst, target, gvar) 470 except KeyError: 471 raise TypingError("Untyped global name '%s'" % gvar.name, --> 472 loc=inst.loc) 473 self.assumed_immutables.add(inst) 474 self.typevars[target.name].lock(gvty) TypingError: Untyped global name 'Decimal' File "", line 4