Numba is a just-in-time compiler for Python that works best on code that usesNumPy arrays and functions, and loops. The most common way to use Numba isthrough its collection of decorators that can be applied to your functions toinstruct Numba to compile them. When a call is made to a Numba decoratedfunction it is compiled to machine code “just-in-time” for execution and all orpart of your code can subsequently run at native machine code speed!
Out of the box Numba works with the following:
OS: Windows (32 and 64 bit), OSX and Linux (32 and 64 bit)
Architecture: x86, x86_64, ppc64le. Experimental on armv7l, armv8l (aarch64).
GPUs: Nvidia CUDA. Experimental on AMD ROC.
CPython
NumPy 1.15 - latest
How do I get it?¶
Numba is available as a conda package for theAnaconda Python distribution:
$ conda install numba
Numba also has wheels available:
$ pip install numba
Numba can also becompiled from source, although we donot recommend it for first-time Numba users.
Numba is often used as a core package so its dependencies are kept to anabsolute minimum, however, extra packages can be installed as follows to provideadditional functionality:
scipy
- enables support for compilingnumpy.linalg
functions.colorama
- enables support for color highlighting in backtraces/errormessages.pyyaml
- enables configuration of Numba via a YAML config file.icc_rt
- allows the use of the Intel SVML (high performance short vectormath library, x86_64 only). Installation instructions are in theperformance tips.
Will Numba work for my code?¶
This depends on what your code looks like, if your code is numericallyorientated (does a lot of math), uses NumPy a lot and/or has a lot of loops,then Numba is often a good choice. In these examples we’ll apply the mostfundamental of Numba’s JIT decorators, @jit
, to try and speed up somefunctions to demonstrate what works well and what does not.
Numba works well on code that looks like this:
from numba import jitimport numpy as npx = np.arange(100).reshape(10, 10)@jit(nopython=True) # Set "nopython" mode for best performance, equivalent to @njitdef go_fast(a): # Function is compiled to machine code when called the first time trace = 0.0 for i in range(a.shape[0]): # Numba likes loops trace += np.tanh(a[i, i]) # Numba likes NumPy functions return a + trace # Numba likes NumPy broadcastingprint(go_fast(x))
It won’t work very well, if at all, on code that looks like this:
from numba import jitimport pandas as pdx = {'a': [1, 2, 3], 'b': [20, 30, 40]}@jitdef use_pandas(a): # Function will not benefit from Numba jit df = pd.DataFrame.from_dict(a) # Numba doesn't know about pd.DataFrame df += 1 # Numba doesn't understand what this is return df.cov() # or this!print(use_pandas(x))
Note that Pandas is not understood by Numba and as a result Numba would simplyrun this code via the interpreter but with the added cost of the Numba internaloverheads!
What is nopython
mode?¶
The Numba @jit
decorator fundamentally operates in two compilation modes,nopython
mode and object
mode. In the go_fast
example above,nopython=True
is set in the @jit
decorator, this is instructing Numba tooperate in nopython
mode. The behaviour of the nopython
compilation modeis to essentially compile the decorated function so that it will run entirelywithout the involvement of the Python interpreter. This is the recommended andbest-practice way to use the Numba jit
decorator as it leads to the bestperformance.
Should the compilation in nopython
mode fail, Numba can compile usingobject mode
, this is a fall back mode for the @jit
decorator ifnopython=True
is not set (as seen in the use_pandas
example above). Inthis mode Numba will identify loops that it can compile and compile those intofunctions that run in machine code, and it will run the rest of the code in theinterpreter. For best performance avoid using this mode!
How to measure the performance of Numba?¶
First, recall that Numba has to compile your function for the argument typesgiven before it executes the machine code version of your function, this takestime. However, once the compilation has taken place Numba caches the machinecode version of your function for the particular types of arguments presented.If it is called again the with same types, it can reuse the cached versioninstead of having to compile again.
A really common mistake when measuring performance is to not account for theabove behaviour and to time code once with a simple timer that includes thetime taken to compile your function in the execution time.
For example:
from numba import jitimport numpy as npimport timex = np.arange(100).reshape(10, 10)@jit(nopython=True)def go_fast(a): # Function is compiled and runs in machine code trace = 0.0 for i in range(a.shape[0]): trace += np.tanh(a[i, i]) return a + trace# DO NOT REPORT THIS... COMPILATION TIME IS INCLUDED IN THE EXECUTION TIME!start = time.time()go_fast(x)end = time.time()print("Elapsed (with compilation) = %s" % (end - start))# NOW THE FUNCTION IS COMPILED, RE-TIME IT EXECUTING FROM CACHEstart = time.time()go_fast(x)end = time.time()print("Elapsed (after compilation) = %s" % (end - start))
This, for example prints:
Elapsed (with compilation) = 0.33030009269714355Elapsed (after compilation) = 6.67572021484375e-06
A good way to measure the impact Numba JIT has on your code is to time executionusing the timeit modulefunctions, these measure multiple iterations of execution and, as a result,can be made to accommodate for the compilation time in the first execution.
As a side note, if compilation time is an issue, Numba JIT supportson-disk caching of compiled functions and also hasan Ahead-Of-Time compilation mode.
How fast is it?¶
Assuming Numba can operate in nopython
mode, or at least compile some loops,it will target compilation to your specific CPU. Speed up varies depending onapplication but can be one to two orders of magnitude. Numba has aperformance guide that covers common options forgaining extra performance.
How does Numba work?¶
Numba reads the Python bytecode for a decorated function and combines this withinformation about the types of the input arguments to the function. It analyzesand optimizes your code, and finally uses the LLVM compiler library to generatea machine code version of your function, tailored to your CPU capabilities. Thiscompiled version is then used every time your function is called.
Other things of interest:¶
Numba has quite a few decorators, we’ve seen @jit
, but there’salso:
@njit
- this is an alias for@jit(nopython=True)
as it is so commonlyused!@vectorize
- produces NumPyufunc
s (with all theufunc
methodssupported). Docs are here.@guvectorize
- produces NumPy generalizedufunc
s.Docs are here.@stencil
- declare a function as a kernel for a stencil like operation.Docs are here.@jitclass
- for jit aware classes. Docs are here.@cfunc
- declare a function for use as a native call back (to be calledfrom C/C++ etc). Docs are here.@overload
- register your own implementation of a function for use innopython mode, e.g.@overload(scipy.special.j0)
.Docs are here.
Extra options available in some decorators:
parallel = True
- enable theautomatic parallelization of the function.fastmath = True
- enable fast-mathbehaviour for the function.
ctypes/cffi/cython interoperability:
cffi
- The calling of CFFI functions is supportedinnopython
mode.ctypes
- The calling of ctypes wrappedfunctions is supported innopython
mode..Cython exported functions are callable.
GPU targets:¶
Numba can target Nvidia CUDA and(experimentally) AMD ROC GPUs. You can write akernel in pure Python and have Numba handle the computation and data movement(or do this explicitly). Click for Numba documentation onCUDA or ROC.