10. Release Notes

10.1. Version 0.43.0

In this release, the major new features are:

  • Initial support for statically typed dictionaries
  • Improvements to hash() to match Python 3 behavior
  • Support for the heapq module
  • Ability to pass C structs to Numba
  • More NumPy functions: asarray, trapz, roll, ptp, extract

NOTE:

The vast majority of NumPy 1.16 behaviour is supported, however datetime and timedelta use involving NaT matches the behaviour present in earlier release. The ufunc suite has not been extending to accommodate the two new time computation related additions present in NumPy 1.16. In addition the functions ediff1d and interp have known minor issues in replicating outputs exactly when NaN’s occur in certain input patterns.

General Enhancements:

  • PR #3563: Support for np.roll
  • PR #3572: Support for np.ptp
  • PR #3592: Add dead branch prune before type inference.
  • PR #3598: Implement np.asarray()
  • PR #3604: Support for np.interp
  • PR #3607: Some simplication to lowering
  • PR #3612: Exact match flag in dispatcher
  • PR #3627: Support for np.trapz
  • PR #3630: np.where with broadcasting
  • PR #3633: Support for np.extract
  • PR #3657: np.max, np.min, np.nanmax, np.nanmin - support for complex dtypes
  • PR #3661: Access C Struct as Numpy Structured Array
  • PR #3678: Support for str.split and str.join
  • PR #3684: Support C array in C struct
  • PR #3696: Add intrinsic to help debug refcount
  • PR #3703: Implementations of type hashing.
  • PR #3715: Port CPython3.7 dictionary for numba internal use
  • PR #3716: Support inplace concat of strings
  • PR #3718: Add location to ConstantInferenceError exceptions.
  • PR #3720: improve error msg about invalid signature
  • PR #3731: Support for heapq
  • PR #3754: Updates for llvmlite 0.28
  • PR #3760: Overloadable operator.setitem
  • PR #3775: Support overloading operator.delitem
  • PR #3777: Implement compiler support for dictionary
  • PR #3791: Implement interpreter-side interface for numba dict
  • PR #3799: Support refcount’ed types in numba dict

CUDA Enhancements/Fixes:

  • PR #3713: Fix the NvvmSupportError message when CC too low
  • PR #3722: Fix #3705: slicing error with negative strides
  • PR #3755: Make cuda.to_device accept readonly host array
  • PR #3773: Adapt library search to accommodate multiple locations

Documentation Updates:

  • PR #3651: fix link to berryconda in docs
  • PR #3668: Add Azure Pipelines build badge
  • PR #3749: DOC: Clarify when prange is different from range
  • PR #3771: fix a few typos
  • PR #3785: Clarify use of range as function only.
  • PR #3829: Add docs for typed-dict

Fixes:

  • PR #3614: Resolve #3586
  • PR #3618: Skip gdb tests on ARM.
  • PR #3643: Remove support_literals usage
  • PR #3645: Enforce and fix that AbstractTemplate.generic must be returning a Signature
  • PR #3648: Fail on @overload signature mismatch.
  • PR #3660: Added Ignore message to test numba.tests.test_lists.TestLists.test_mul_error
  • PR #3662: Replace six with numba.six
  • PR #3663: Removes coverage computation from travisci builds
  • PR #3672: Avoid leaking memory when iterating over uniform tuple
  • PR #3676: Fixes constant string lowering inside tuples
  • PR #3677: Ensure all referenced compiled functions are linked properly
  • PR #3692: Fix test failure due to overly strict test on floating point values.
  • PR #3693: Intercept failed import to help users.
  • PR #3694: Fix memory leak in enumerate iterator
  • PR #3695: Convert return of None from intrinsic implementation to dummy value
  • PR #3697: Fix for issue #3687
  • PR #3701: Fix array.T analysis (fixes #3700)
  • PR #3704: Fixes for overload_method
  • PR #3706: Don’t push call vars recursively into nested parfors. Resolves #3686.
  • PR #3710: Set as non-hoistable if a mutable variable is passed to a function in a loop. Resolves #3699.
  • PR #3712: parallel=True to use better builtin mechanism to resolve call types. Resolves issue #3671
  • PR #3725: Fix invalid removal of dead empty list
  • PR #3740: add uintp as a valid type to the tuple operator.getitem
  • PR #3758: Fix target definition update in inlining
  • PR #3782: Raise typing error on yield optional.
  • PR #3792: Fix non-module object used as the module of a function.
  • PR #3800: Bugfix for np.interp
  • PR #3808: Bump macro to include VS2014 to fix py3.5 build
  • PR #3809: Add debug guard to debug only C function.
  • PR #3816: Fix array.sum(axis) 1d input return type.
  • PR #3821: Replace PySys_WriteStdout with PySys_FormatStdout to ensure no truncation.
  • PR #3830: Getitem should not return optional type
  • PR #3832: Handle single string as path in find_file()

Contributors:

  • Ehsan Totoni
  • Gryllos Prokopis
  • Jonathan J. Helmus
  • Kayla Ngan
  • lalitparate
  • luk-f-a
  • Matyt
  • Max Bolingbroke
  • Michael Seifert
  • Rob Ennis
  • Siu Kwan Lam
  • Stan Seibert
  • Stuart Archibald
  • Todd A. Anderson
  • Tao He
  • Valentin Haenel

10.2. Version 0.42.1

Bugfix release to fix the incorrect hash in OSX wheel packages. No change in source code.

10.3. Version 0.42.0

In this release the major features are:

  • The capability to launch and attach the GDB debugger from within a jitted function.
  • The upgrading of LLVM to version 7.0.0.

We added a draft of the project roadmap to the developer manual. The roadmap is for informational purposes only as priorities and resources may change.

Here are some enhancements from contributed PRs:

  • #3532. Daniel Wennberg improved the cuda.{pinned, mapped} API so that the associated memory is released immediately at the exit of the context manager.
  • #3531. Dimitri Vorona enabled the inlining of jitclass methods.
  • #3516. Simon Perkins added the support for passing numpy dtypes (i.e. np.dtype("int32")) and their type constructor (i.e. np.int32) into a jitted function.
  • #3509. Rob Ennis added support for np.corrcoef.

A regression issue (#3554, #3461) relating to making an empty slice in parallel mode is resolved by #3558.

General Enhancements:

  • PR #3392: Launch and attach gdb directly from Numba.
  • PR #3437: Changes to accommodate LLVM 7.0.x
  • PR #3509: Support for np.corrcoef
  • PR #3516: Typeof dtype values
  • PR #3520: Fix @stencil ignoring cval if out kwarg supplied.
  • PR #3531: Fix jitclass method inlining and avoid unnecessary increfs
  • PR #3538: Avoid future C-level assertion error due to invalid visibility
  • PR #3543: Avoid implementation error being hidden by the try-except
  • PR #3544: Add long_running test flag and feature to exclude tests.
  • PR #3549: ParallelAccelerator caching improvements
  • PR #3558: Fixes array analysis for inplace binary operators.
  • PR #3566: Skip alignment tests on armv7l.
  • PR #3567: Fix unifying literal types in namedtuple
  • PR #3576: Add special copy routine for NumPy out arrays
  • PR #3577: Fix example and docs typos for objmode context manager. reorder statements.
  • PR #3580: Use alias information when determining whether it is safe to
  • PR #3583: Use ir.unknown_loc for unknown Loc, as #3390 with tests
  • PR #3587: Fix llvm.memset usage changes in llvm7
  • PR #3596: Fix Array Analysis for Global Namedtuples
  • PR #3597: Warn users if threading backend init unsafe.
  • PR #3605: Add guard for writing to read only arrays from ufunc calls
  • PR #3606: Improve the accuracy of error message wording for undefined type.
  • PR #3611: gdb test guard needs to ack ptrace permissions
  • PR #3616: Skip gdb tests on ARM.

CUDA Enhancements:

  • PR #3532: Unregister temporarily pinned host arrays at once
  • PR #3552: Handle broadcast arrays correctly in host->device transfer.
  • PR #3578: Align cuda and cuda simulator kwarg names.

Documentation Updates:

  • PR #3545: Fix @njit description in 5 min guide
  • PR #3570: Minor documentation fixes for numba.cuda
  • PR #3581: Fixing minor typo in reference/types.rst
  • PR #3594: Changing @stencil docs to correctly reflect func_or_mode param
  • PR #3617: Draft roadmap as of Dec 2018

Contributors:

  • Aaron Critchley
  • Daniel Wennberg
  • Dimitri Vorona
  • Dominik Stańczak
  • Ehsan Totoni (core dev)
  • Iskander Sharipov
  • Rob Ennis
  • Simon Muller
  • Simon Perkins
  • Siu Kwan Lam (core dev)
  • Stan Seibert (core dev)
  • Stuart Archibald (core dev)
  • Todd A. Anderson (core dev)

10.4. Version 0.41.0

This release adds the following major features:

  • Diagnostics showing the optimizations done by ParallelAccelerator
  • Support for profiling Numba-compiled functions in Intel VTune
  • Additional NumPy functions: partition, nancumsum, nancumprod, ediff1d, cov, conj, conjugate, tri, tril, triu
  • Initial support for Python 3 Unicode strings

General Enhancements:

  • PR #1968: armv7 support
  • PR #2983: invert mapping b/w binop operators and the operator module #2297
  • PR #3160: First attempt at parallel diagnostics
  • PR #3307: Adding NUMBA_ENABLE_PROFILING envvar, enabling jit event
  • PR #3320: Support for np.partition
  • PR #3324: Support for np.nancumsum and np.nancumprod
  • PR #3325: Add location information to exceptions.
  • PR #3337: Support for np.ediff1d
  • PR #3345: Support for np.cov
  • PR #3348: Support user pipeline class in with lifting
  • PR #3363: string support
  • PR #3373: Improve error message for empty imprecise lists.
  • PR #3375: Enable overload(operator.getitem)
  • PR #3402: Support negative indexing in tuple.
  • PR #3414: Refactor Const type
  • PR #3416: Optimized usage of alloca out of the loop
  • PR #3424: Updates for llvmlite 0.26
  • PR #3462: Add support for np.conj/np.conjugate.
  • PR #3480: np.tri, np.tril, np.triu - default optional args
  • PR #3481: Permit dtype argument as sole kwarg in np.eye

CUDA Enhancements:

  • PR #3399: Add max_registers Option to cuda.jit

Continuous Integration / Testing:

  • PR #3303: CI with Azure Pipelines
  • PR #3309: Workaround race condition with apt
  • PR #3371: Fix issues with Azure Pipelines
  • PR #3362: Fix #3360: RuntimeWarning: ‘numba.runtests’ found in sys.modules
  • PR #3374: Disable openmp in wheel building
  • PR #3404: Azure Pipelines templates
  • PR #3419: Fix cuda tests and error reporting in test discovery
  • PR #3491: Prevent faulthandler installation on armv7l
  • PR #3493: Fix CUDA test that used negative indexing behaviour that’s fixed.
  • PR #3495: Start Flake8 checking of Numba source

Fixes:

  • PR #2950: Fix dispatcher to only consider contiguous-ness.
  • PR #3124: Fix 3119, raise for 0d arrays in reductions
  • PR #3228: Reduce redundant module linking
  • PR #3329: Fix AOT on windows.
  • PR #3335: Fix memory management of __cuda_array_interface__ views.
  • PR #3340: Fix typo in error name.
  • PR #3365: Fix the default unboxing logic
  • PR #3367: Allow non-global reference to objmode() context-manager
  • PR #3381: Fix global reference in objmode for dynamically created function
  • PR #3382: CUDA_ERROR_MISALIGNED_ADDRESS Using Multiple Const Arrays
  • PR #3384: Correctly handle very old versions of colorama
  • PR #3394: Add 32bit package guard for non-32bit installs
  • PR #3397: Fix with-objmode warning
  • PR #3403 Fix label offset in call inline after parfor pass
  • PR #3429: Fixes raising of user defined exceptions for exec(<string>).
  • PR #3432: Fix error due to function naming in CI in py2.7
  • PR #3444: Fixed TBB’s single thread execution and test added for #3440
  • PR #3449: Allow matching non-array objects in find_callname()
  • PR #3455: Change getiter and iternext to not be pure. Resolves #3425
  • PR #3467: Make ir.UndefinedType singleton class.
  • PR #3478: Fix np.random.shuffle sideeffect
  • PR #3487: Raise unsupported for kwargs given to print()
  • PR #3488: Remove dead script.
  • PR #3498: Fix stencil support for boolean as return type
  • PR #3511: Fix handling make_function literals (regression of #3414)
  • PR #3514: Add missing unicode != unicode
  • PR #3527: Fix complex math sqrt implementation for large -ve values
  • PR #3530: This adds arg an check for the pattern supplied to Parfors.
  • PR #3536: Sets list dtor linkage to linkonce_odr to fix visibility in AOT

Documentation Updates:

  • PR #3316: Update 0.40 changelog with additional PRs
  • PR #3318: Tweak spacing to avoid search box wrapping onto second line
  • PR #3321: Add note about memory leaks with exceptions to docs. Fixes #3263
  • PR #3322: Add FAQ on CUDA + fork issue. Fixes #3315.
  • PR #3343: Update docs for argsort, kind kwarg partially supported.
  • PR #3357: Added mention of njit in 5minguide.rst
  • PR #3434: Fix parallel reduction example in docs.
  • PR #3452: Fix broken link and mark up problem.
  • PR #3484: Size Numba logo in docs in em units. Fixes #3313
  • PR #3502: just two typos
  • PR #3506: Document string support
  • PR #3513: Documentation for parallel diagnostics.
  • PR #3526: Fix 5 min guide with respect to @njit decl

Contributors:

  • Alex Ford
  • Andreas Sodeur
  • Anton Malakhov
  • Daniel Stender
  • Ehsan Totoni (core dev)
  • Henry Schreiner
  • Marcel Bargull
  • Matt Cooper
  • Nick White
  • Nicolas Hug
  • rjenc29
  • Siu Kwan Lam (core dev)
  • Stan Seibert (core dev)
  • Stuart Archibald (core dev)
  • Todd A. Anderson (core dev)

10.5. Version 0.40.1

This is a PyPI-only patch release to ensure that PyPI wheels can enable the TBB threading backend, and to disable the OpenMP backend in the wheels. Limitations of manylinux1 and variation in user environments can cause segfaults when OpenMP is enabled on wheel builds. Note that this release has no functional changes for users who obtained Numba 0.40.0 via conda.

Patches:

  • PR #3338: Accidentally left Anton off contributor list for 0.40.0
  • PR #3374: Disable OpenMP in wheel building
  • PR #3376: Update 0.40.1 changelog and docs on OpenMP backend

10.6. Version 0.40.0

This release adds a number of major features:

  • A new GPU backend: kernels for AMD GPUs can now be compiled using the ROCm driver on Linux.
  • The thread pool implementation used by Numba for automatic multithreading is configurable to use TBB, OpenMP, or the old “workqueue” implementation. (TBB is likely to become the preferred default in a future release.)
  • New documentation on thread and fork-safety with Numba, along with overall improvements in thread-safety.
  • Experimental support for executing a block of code inside a nopython mode function in object mode.
  • Parallel loops now allow arrays as reduction variables
  • CUDA improvements: FMA, faster float64 atomics on supporting hardware, records in const memory, and improved datatime dtype support
  • More NumPy functions: vander, tri, triu, tril, fill_diagonal

General Enhancements:

  • PR #3017: Add facility to support with-contexts
  • PR #3033: Add support for multidimensional CFFI arrays
  • PR #3122: Add inliner to object mode pipeline
  • PR #3127: Support for reductions on arrays.
  • PR #3145: Support for np.fill_diagonal
  • PR #3151: Keep a queue of references to last N deserialized functions. Fixes #3026
  • PR #3154: Support use of list() if typeable.
  • PR #3166: Objmode with-block
  • PR #3179: Updates for llvmlite 0.25
  • PR #3181: Support function extension in alias analysis
  • PR #3189: Support literal constants in typing of object methods
  • PR #3190: Support passing closures as literal values in typing
  • PR #3199: Support inferring stencil index as constant in simple unary expressions
  • PR #3202: Threading layer backend refactor/rewrite/reinvention!
  • PR #3209: Support for np.tri, np.tril and np.triu
  • PR #3211: Handle unpacking in building tuple (BUILD_TUPLE_UNPACK opcode)
  • PR #3212: Support for np.vander
  • PR #3227: Add NumPy 1.15 support
  • PR #3272: Add MemInfo_data to runtime._nrt_python.c_helpers
  • PR #3273: Refactor. Removing thread-local-storage based context nesting.
  • PR #3278: compiler threadsafety lockdown
  • PR #3291: Add CPU count and CFS restrictions info to numba -s.

CUDA Enhancements:

  • PR #3152: Use cuda driver api to get best blocksize for best occupancy
  • PR #3165: Add FMA intrinsic support
  • PR #3172: Use float64 add Atomics, Where Available
  • PR #3186: Support Records in CUDA Const Memory
  • PR #3191: CUDA: fix log size
  • PR #3198: Fix GPU datetime timedelta types usage
  • PR #3221: Support datetime/timedelta scalar argument to a CUDA kernel.
  • PR #3259: Add DeviceNDArray.view method to reinterpret data as a different type.
  • PR #3310: Fix IPC handling of sliced cuda array.

ROCm Enhancements:

  • PR #3023: Support for AMDGCN/ROCm.
  • PR #3108: Add ROC info to numba -s output.
  • PR #3176: Move ROC vectorize init to npyufunc
  • PR #3177: Add auto_synchronize support to ROC stream
  • PR #3178: Update ROC target documentation.
  • PR #3294: Add compiler lock to ROC compilation path.
  • PR #3280: Add wavebits property to the HSA Agent.
  • PR #3281: Fix ds_permute types and add tests

Continuous Integration / Testing:

  • PR #3091: Remove old recipes, switch to test config based on env var.
  • PR #3094: Add higher ULP tolerance for products in complex space.
  • PR #3096: Set exit on error in incremental scripts
  • PR #3109: Add skip to test needing jinja2 if no jinja2.
  • PR #3125: Skip cudasim only tests
  • PR #3126: add slack, drop flowdock
  • PR #3147: Improve error message for arg type unsupported during typing.
  • PR #3128: Fix recipe/build for jetson tx2/ARM
  • PR #3167: In build script activate env before installing.
  • PR #3180: Add skip to broken test.
  • PR #3216: Fix libcuda.so loading in some container setup
  • PR #3224: Switch to new Gitter notification webhook URL and encrypt it
  • PR #3235: Add 32bit Travis CI jobs
  • PR #3257: This adds scipy/ipython back into windows conda test phase.

Fixes:

  • PR #3038: Fix random integer generation to match results from NumPy.
  • PR #3045: Fix #3027 - Numba reassigns sys.stdout
  • PR #3059: Handler for known LoweringErrors.
  • PR #3060: Adjust attribute error for NumPy functions.
  • PR #3067: Abort simulator threads on exception in thread block.
  • PR #3079: Implement +/-(types.boolean) Fix #2624
  • PR #3080: Compute np.var and np.std correctly for complex types.
  • PR #3088: Fix #3066 (array.dtype.type in prange)
  • PR #3089: Fix invalid ParallelAccelerator hoisting issue.
  • PR #3136: Fix #3135 (lowering error)
  • PR #3137: Fix for issue3103 (race condition detection)
  • PR #3142: Fix Issue #3139 (parfors reuse of reduction variable across prange blocks)
  • PR #3148: Remove dead array equal @infer code
  • PR #3153: Fix canonicalize_array_math typing for calls with kw args
  • PR #3156: Fixes issue with missing pygments in testing and adds guards.
  • PR #3168: Py37 bytes output fix.
  • PR #3171: Fix #3146. Fix CFUNCTYPE void* return-type handling
  • PR #3193: Fix setitem/getitem resolvers
  • PR #3222: Fix #3214. Mishandling of POP_BLOCK in while True loop.
  • PR #3230: Fixes liveness analysis issue in looplifting
  • PR #3233: Fix return type difference for 32bit ctypes.c_void_p
  • PR #3234: Fix types and layout for np.where.
  • PR #3237: Fix DeprecationWarning about imp module
  • PR #3241: Fix #3225. Normalize 0nd array to scalar in typing of indexing code.
  • PR #3256: Fix #3251: Move imports of ABCs to collections.abc for Python >= 3.3
  • PR #3292: Fix issue3279.
  • PR #3302: Fix error due to mismatching dtype

Documentation Updates:

  • PR #3104: Workaround for #3098 (test_optional_unpack Heisenbug)
  • PR #3132: Adds an ~5 minute guide to Numba.
  • PR #3194: Fix docs RE: np.random generator fork/thread safety
  • PR #3242: Page with Numba talks and tutorial links
  • PR #3258: Allow users to choose the type of issue they are reporting.
  • PR #3260: Fixed broken link
  • PR #3266: Fix cuda pointer ownership problem with user/externally allocated pointer
  • PR #3269: Tweak typography with CSS
  • PR #3270: Update FAQ for functions passed as arguments
  • PR #3274: Update installation instructions
  • PR #3275: Note pyobject and voidptr are types in docs
  • PR #3288: Do not need to call parallel optimizations “experimental” anymore
  • PR #3318: Tweak spacing to avoid search box wrapping onto second line

Contributors:

  • Anton Malakhov
  • Alex Ford
  • Anthony Bisulco
  • Ehsan Totoni (core dev)
  • Leonard Lausen
  • Matthew Petroff
  • Nick White
  • Ray Donnelly
  • rjenc29
  • Siu Kwan Lam (core dev)
  • Stan Seibert (core dev)
  • Stuart Archibald (core dev)
  • Stuart Reynolds
  • Todd A. Anderson (core dev)

10.7. Version 0.39.0

Here are the highlights for the Numba 0.39.0 release.

  • This is the first version that supports Python 3.7.
  • With help from Intel, we have fixed the issues with SVML support (related issues #2938, #2998, #3006).
  • List has gained support for containing reference-counted types like NumPy arrays and list. Note, list still cannot hold heterogeneous types.
  • We have made a significant change to the internal calling-convention, which should be transparent to most users, to allow for a future feature that will permitting jumping back into python-mode from a nopython-mode function. This also fixes a limitation to print that disabled its use from nopython functions that were deep in the call-stack.
  • For CUDA GPU support, we added a __cuda_array_interface__ following the NumPy array interface specification to allow Numba to consume externally defined device arrays. We have opened a corresponding pull request to CuPy to test out the concept and be able to use a CuPy GPU array.
  • The Numba dispatcher inspect_types() method now supports the kwarg pretty which if set to True will produce ANSI/HTML output, showing the annotated types, when invoked from ipython/jupyter-notebook respectively.
  • The NumPy functions ndarray.dot, np.percentile and np.nanpercentile, and np.unique are now supported.
  • Numba now supports the use of a per-project configuration file to permanently set behaviours typically set via NUMBA_* family environment variables.
  • Support for the ppc64le architecture has been added.

Enhancements:

  • PR #2793: Simplify and remove javascript from html_annotate templates.
  • PR #2840: Support list of refcounted types
  • PR #2902: Support for np.unique
  • PR #2926: Enable fence for all architecture and add developer notes
  • PR #2928: Making error about untyped list more informative.
  • PR #2930: Add configuration file and color schemes.
  • PR #2932: Fix encoding to ‘UTF-8’ in check_output decode.
  • PR #2938: Python 3.7 compat: _Py_Finalizing becomes _Py_IsFinalizing()
  • PR #2939: Comprehensive SVML unit test
  • PR #2946: Add support for ndarray.dot method and tests.
  • PR #2953: percentile and nanpercentile
  • PR #2957: Add new 3.7 opcode support.
  • PR #2963: Improve alias analysis to be more comprehensive
  • PR #2984: Support for namedtuples in array analysis
  • PR #2986: Fix environment propagation
  • PR #2990: Improve function call matching for intrinsics
  • PR #3002: Second pass at error rewrites (interpreter errors).
  • PR #3004: Add numpy.empty to the list of pure functions.
  • PR #3008: Augment SVML detection with llvmlite SVML patch detection.
  • PR #3012: Make use of the common spelling of heterogeneous/homogeneous.
  • PR #3032: Fix pycc ctypes test due to mismatch in calling-convention
  • PR #3039: Add SVML detection to Numba environment diagnostic tool.
  • PR #3041: This adds @needs_blas to tests that use BLAS
  • PR #3056: Require llvmlite>=0.24.0

CUDA Enhancements:

  • PR #2860: __cuda_array_interface__
  • PR #2910: More CUDA intrinsics
  • PR #2929: Add Flag To Prevent Unneccessary D->H Copies
  • PR #3037: Add CUDA IPC support on non-peer-accessible devices

CI Enhancements:

  • PR #3021: Update appveyor config.
  • PR #3040: Add fault handler to all builds
  • PR #3042: Add catchsegv
  • PR #3077: Adds optional number of processes for -m in testing

Fixes:

  • PR #2897: Fix line position of delete statement in numba ir
  • PR #2905: Fix for #2862
  • PR #3009: Fix optional type returning in recursive call
  • PR #3019: workaround and unittest for issue #3016
  • PR #3035: [TESTING] Attempt delayed removal of Env
  • PR #3048: [WIP] Fix cuda tests failure on buildfarm
  • PR #3054: Make test work on 32-bit
  • PR #3062: Fix cuda.In freeing devary before the kernel launch
  • PR #3073: Workaround #3072
  • PR #3076: Avoid ignored exception due to missing globals at interpreter teardown

Documentation Updates:

  • PR #2966: Fix syntax in env var docs.
  • PR #2967: Fix typo in CUDA kernel layout example.
  • PR #2970: Fix docstring copy paste error.

Contributors:

The following people contributed to this release.

  • Anton Malakhov
  • Ehsan Totoni (core dev)
  • Julia Tatz
  • Matthias Bussonnier
  • Nick White
  • Ray Donnelly
  • Siu Kwan Lam (core dev)
  • Stan Seibert (core dev)
  • Stuart Archibald (core dev)
  • Todd A. Anderson (core dev)
  • Rik-de-Kort
  • rjenc29

10.8. Version 0.38.1

This is a critical bug fix release addressing: https://github.com/numba/numba/issues/3006

The bug does not impact users using conda packages from Anaconda or Intel Python Distribution (but it does impact conda-forge). It does not impact users of pip using wheels from PyPI.

This only impacts a small number of users where:

  • The ICC runtime (specifically libsvml) is present in the user’s environment.
  • The user is using an llvmlite statically linked against a version of LLVM that has not been patched with SVML support.
  • The platform is 64-bit.

The release fixes a code generation path that could lead to the production of incorrect results under the above situation.

Fixes:

  • PR #3007: Augment SVML detection with llvmlite SVML patch detection.

Contributors:

The following people contributed to this release.

  • Stuart Archibald (core dev)

10.9. Version 0.38.0

Following on from the bug fix focus of the last release, this release swings back towards the addition of new features and usability improvements based on community feedback. This release is comparatively large! Three key features/ changes to note are:

  • Numba (via llvmlite) is now backed by LLVM 6.0, general vectorization is improved as a result. A significant long standing LLVM bug that was causing corruption was also found and fixed.
  • Further considerable improvements in vectorization are made available as Numba now supports Intel’s short vector math library (SVML). Try it out with conda install -c numba icc_rt.
  • CUDA 8.0 is now the minimum supported CUDA version.

Other highlights include:

  • Bug fixes to parallel=True have enabled more vectorization opportunities when using the ParallelAccelerator technology.
  • Much effort has gone into improving error reporting and the general usability of Numba. This includes highlighted error messages and performance tips documentation. Try it out with conda install colorama.
  • A number of new NumPy functions are supported, np.convolve, np.correlate np.reshape, np.transpose, np.permutation, np.real, np.imag, and np.searchsorted now supports the`side` kwarg. Further, np.argsort now supports the kind kwarg with quicksort and mergesort available.
  • The Numba extension API has gained the ability operate more easily with functions from Cython modules through the use of numba.extending.get_cython_function_address to obtain function addresses for direct use in ctypes.CFUNCTYPE.
  • Numba now allows the passing of jitted functions (and containers of jitted functions) as arguments to other jitted functions.
  • The CUDA functionality has gained support for a larger selection of bit manipulation intrinsics, also SELP, and has had a number of bugs fixed.
  • Initial work to support the PPC64LE platform has been added, full support is however waiting on the LLVM 6.0.1 release as it contains critical patches not present in 6.0.0. It is hoped that any remaining issues will be fixed in the next release.
  • The capacity for advanced users/compiler engineers to define their own compilation pipelines.

Enhancements:

  • PR #2660: Support bools from cffi in nopython.
  • PR #2741: Enhance error message for undefined variables.
  • PR #2744: Add diagnostic error message to test suite discovery failure.
  • PR #2748: Added Intel SVML optimizations as opt-out choice working by default
  • PR #2762: Support transpose with axes arguments.
  • PR #2777: Add support for np.correlate and np.convolve
  • PR #2779: Implement np.random.permutation
  • PR #2801: Passing jitted functions as args
  • PR #2802: Support np.real() and np.imag()
  • PR #2807: Expose import_cython_function
  • PR #2821: Add kwarg ‘side’ to np.searchsorted
  • PR #2822: Adds stable argsort
  • PR #2832: Fixups for llvmlite 0.23/llvm 6
  • PR #2836: Support index method on tuples
  • PR #2839: Support for np.transpose and np.reshape.
  • PR #2843: Custom pipeline
  • PR #2847: Replace signed array access indices in unsiged prange loop body
  • PR #2859: Add support for improved error reporting.
  • PR #2880: This adds a github issue template.
  • PR #2881: Build recipe to clone Intel ICC runtime.
  • PR #2882: Update TravisCI to test SVML
  • PR #2893: Add reference to the data buffer in array.ctypes object
  • PR #2895: Move to CUDA 8.0

Fixes:

  • PR #2737: Fix #2007 (part 1). Empty array handling in np.linalg.
  • PR #2738: Fix install_requires to allow pip getting pre-release version
  • PR #2740: Fix 2208. Generate better error message.
  • PR #2765: Fix Bit-ness
  • PR #2780: PowerPC reference counting memory fences
  • PR #2805: Fix six imports.
  • PR #2813: Fix #2812: gufunc scalar output bug.
  • PR #2814: Fix the build post #2727
  • PR #2831: Attempt to fix #2473
  • PR #2842: Fix issue with test discovery and broken CUDA drivers.
  • PR #2850: Add rtsys init guard and test.
  • PR #2852: Skip vectorization test with targets that are not x86
  • PR #2856: Prevent printing to stdout in test_extending.py
  • PR #2864: Correct C code to prevent compiler warnings.
  • PR #2889: Attempt to fix #2386.
  • PR #2891: Removed test skipping for inspect_cfg
  • PR #2898: Add guard to parallel test on unsupported platforms
  • PR #2907: Update change log for PPC64LE LLVM dependency.
  • PR #2911: Move build requirement to llvmlite>=0.23.0dev0
  • PR #2912: Fix random permutation test.
  • PR #2914: Fix MD list syntax in issue template.

Documentation Updates:

  • PR #2739: Explicitly state default value of error_model in docstring
  • PR #2803: DOC: parallel vectorize requires signatures
  • PR #2829: Add Python 2.7 EOL plan to docs
  • PR #2838: Use automatic numbering syntax in list.
  • PR #2877: Add performance tips documentation.
  • PR #2883: Fix #2872: update rng doc about thread/fork-safety
  • PR #2908: Add missing link and ref to docs.
  • PR #2909: Tiny typo correction

ParallelAccelerator enhancements/fixes:

  • PR #2727: Changes to enable vectorization in ParallelAccelerator.
  • PR #2816: Array analysis for transpose with arbitrary arguments
  • PR #2874: Fix dead code eliminator not to remove a call with side-effect
  • PR #2886: Fix ParallelAccelerator arrayexpr repr

CUDA enhancements:

  • PR #2734: More Constants From cuda.h
  • PR #2767: Add len(..) Support to DeviceNDArray
  • PR #2778: Add More Device Array API Functions to CUDA Simulator
  • PR #2824: Add CUDA Primitives for Population Count
  • PR #2835: Emit selp Instructions to Avoid Branching
  • PR #2867: Full support for CUDA device attributes

CUDA fixes: * PR #2768: Don’t Compile Code on Every Assignment * PR #2878: Fixes a Win64 issue with the test in Pr/2865

Contributors:

The following people contributed to this release.

  • Abutalib Aghayev
  • Alex Olivas
  • Anton Malakhov
  • Dong-hee Na
  • Ehsan Totoni (core dev)
  • John Zwinck
  • Josh Wilson
  • Kelsey Jordahl
  • Nick White
  • Olexa Bilaniuk
  • Rik-de-Kort
  • Siu Kwan Lam (core dev)
  • Stan Seibert (core dev)
  • Stuart Archibald (core dev)
  • Thomas Arildsen
  • Todd A. Anderson (core dev)

10.10. Version 0.37.0

This release focuses on bug fixing and stability but also adds a few new features including support for Numpy 1.14. The key change for Numba core was the long awaited addition of the final tranche of thread safety improvements that allow Numba to be run concurrently on multiple threads without hitting known thread safety issues inside LLVM itself. Further, a number of fixes and enhancements went into the CUDA implementation and ParallelAccelerator gained some new features and underwent some internal refactoring.

Misc enhancements:

  • PR #2627: Remove hacks to make llvmlite threadsafe
  • PR #2672: Add ascontiguousarray
  • PR #2678: Add Gitter badge
  • PR #2691: Fix #2690: add intrinsic to convert array to tuple
  • PR #2703: Test runner feature: failed-first and last-failed
  • PR #2708: Patch for issue #1907
  • PR #2732: Add support for array.fill

Misc Fixes:

  • PR #2610: Fix #2606 lowering of optional.setattr
  • PR #2650: Remove skip for win32 cosine test
  • PR #2668: Fix empty_like from readonly arrays.
  • PR #2682: Fixes 2210, remove _DisableJitWrapper
  • PR #2684: Fix #2340, generator error yielding bool
  • PR #2693: Add travis-ci testing of NumPy 1.14, and also check on Python 2.7
  • PR #2694: Avoid type inference failure due to a typing template rejection
  • PR #2695: Update llvmlite version dependency.
  • PR #2696: Fix tuple indexing codegeneration for empty tuple
  • PR #2698: Fix #2697 by deferring deletion in the simplify_CFG loop.
  • PR #2701: Small fix to avoid tempfiles being created in the current directory
  • PR #2725: Fix 2481, LLVM IR parsing error due to mutated IR
  • PR #2726: Fix #2673: incorrect fork error msg.
  • PR #2728: Alternative to #2620. Remove dead code ByteCodeInst.get.
  • PR #2730: Add guard for test needing SciPy/BLAS

Documentation updates:

  • PR #2670: Update communication channels
  • PR #2671: Add docs about diagnosing loop vectorizer
  • PR #2683: Add docs on const arg requirements and on const mem alloc
  • PR #2722: Add docs on numpy support in cuda
  • PR #2724: Update doc: warning about unsupported arguments

ParallelAccelerator enhancements/fixes:

Parallel support for np.arange and np.linspace, also np.mean, np.std and np.var are added. This was performed as part of a general refactor and cleanup of the core ParallelAccelerator code.

  • PR #2674: Core pa
  • PR #2704: Generate Dels after parfor sequential lowering
  • PR #2716: Handle matching directly supported functions

CUDA enhancements:

  • PR #2665: CUDA DeviceNDArray: Support numpy tranpose API
  • PR #2681: Allow Assigning to DeviceNDArrays
  • PR #2702: Make DummyArray do High Dimensional Reshapes
  • PR #2714: Use CFFI to Reuse Code

CUDA fixes:

  • PR #2667: Fix CUDA DeviceNDArray slicing
  • PR #2686: Fix #2663: incorrect offset when indexing cuda array.
  • PR #2687: Ensure Constructed Stream Bound
  • PR #2706: Workaround for unexpected warp divergence due to exception raising code
  • PR #2707: Fix regression: cuda test submodules not loading properly in runtests
  • PR #2731: Use more challenging values in slice tests.
  • PR #2720: A quick testsuite fix to not run the new cuda testcase in the multiprocess pool

Contributors:

The following people contributed to this release.

  • Coutinho Menezes Nilo
  • Daniel
  • Ehsan Totoni
  • Nick White
  • Paul H. Liu
  • Siu Kwan Lam
  • Stan Seibert
  • Stuart Archibald
  • Todd A. Anderson

10.11. Version 0.36.2

This is a bugfix release that provides minor changes to address:

  • PR #2645: Avoid CPython bug with exec in older 2.7.x.
  • PR #2652: Add support for CUDA 9.

10.12. Version 0.36.1

This release continues to add new features to the work undertaken in partnership with Intel on ParallelAccelerator technology. Other changes of note include the compilation chain being updated to use LLVM 5.0 and the production of conda packages using conda-build 3 and the new compilers that ship with it.

NOTE: A version 0.36.0 was tagged for internal use but not released.

ParallelAccelerator:

NOTE: The ParallelAccelerator technology is under active development and should be considered experimental.

New features relating to ParallelAccelerator, from work undertaken with Intel, include the addition of the @stencil decorator for ease of implementation of stencil-like computations, support for general reductions, and slice and range fusion for parallel slice/bit-array assignments. Documentation on both the use and implementation of the above has been added. Further, a new debug environment variable NUMBA_DEBUG_ARRAY_OPT_STATS is made available to give information about which operators/calls are converted to parallel for-loops.

ParallelAccelerator features:

  • PR #2457: Stencil Computations in ParallelAccelerator
  • PR #2548: Slice and range fusion, parallelizing bitarray and slice assignment
  • PR #2516: Support general reductions in ParallelAccelerator

ParallelAccelerator fixes:

  • PR #2540: Fix bug #2537
  • PR #2566: Fix issue #2564.
  • PR #2599: Fix nested multi-dimensional parfor type inference issue
  • PR #2604: Fixes for stencil tests and cmath sin().
  • PR #2605: Fixes issue #2603.

Additional features of note:

This release of Numba (and llvmlite) is updated to use LLVM version 5.0 as the compiler back end, the main change to Numba to support this was the addition of a custom symbol tracker to avoid the calls to LLVM’s ExecutionEngine that was crashing when asking for non-existent symbol addresses. Further, the conda packages for this release of Numba are built using conda build version 3 and the new compilers/recipe grammar that are present in that release.

  • PR #2568: Update for LLVM 5
  • PR #2607: Fixes abort when getting address to “nrt_unresolved_abort”
  • PR #2615: Working towards conda build 3

Thanks to community feedback and bug reports, the following fixes were also made.

Misc fixes/enhancements:

  • PR #2534: Add tuple support to np.take.
  • PR #2551: Rebranding fix
  • PR #2552: relative doc links
  • PR #2570: Fix issue #2561, handle missing successor on loop exit
  • PR #2588: Fix #2555. Disable libpython.so linking on linux
  • PR #2601: Update llvmlite version dependency.
  • PR #2608: Fix potential cache file collision
  • PR #2612: Fix NRT test failure due to increased overhead when running in coverage
  • PR #2619: Fix dubious pthread_cond_signal not in lock
  • PR #2622: Fix np.nanmedian for all NaN case.
  • PR #2633: Fix markdown in CONTRIBUTING.md
  • PR #2635: Make the dependency on compilers for AOT optional.

CUDA support fixes:

  • PR #2523: Fix invalid cuda context in memory transfer calls in another thread
  • PR #2575: Use CPU to initialize xoroshiro states for GPU RNG. Fixes #2573
  • PR #2581: Fix cuda gufunc mishandling of scalar arg as array and out argument

10.13. Version 0.35.0

This release includes some exciting new features as part of the work performed in partnership with Intel on ParallelAccelerator technology. There are also some additions made to Numpy support and small but significant fixes made as a result of considerable effort spent chasing bugs and implementing stability improvements.

ParallelAccelerator:

NOTE: The ParallelAccelerator technology is under active development and should be considered experimental.

New features relating to ParallelAccelerator, from work undertaken with Intel, include support for a larger range of np.random functions in parallel mode, printing Numpy arrays in no Python mode, the capacity to initialize Numpy arrays directly from list comprehensions, and the axis argument to .sum(). Documentation on the ParallelAccelerator technology implementation has also been added. Further, a large amount of work on equivalence relations was undertaken to enable runtime checks of broadcasting behaviours in parallel mode.

ParallelAccelerator features:

  • PR #2400: Array comprehension
  • PR #2405: Support printing Numpy arrays
  • PR #2438: from Support more np.random functions in ParallelAccelerator
  • PR #2482: Support for sum with axis in nopython mode.
  • PR #2487: Adding developer documentation for ParallelAccelerator technology.
  • PR #2492: Core PA refactor adds assertions for broadcast semantics

ParallelAccelerator fixes:

  • PR #2478: Rename cfg before parfor translation (#2477)
  • PR #2479: Fix broken array comprehension tests on unsupported platforms
  • PR #2484: Fix array comprehension test on win64
  • PR #2506: Fix for 32-bit machines.

Additional features of note:

Support for np.take, np.finfo, np.iinfo and np.MachAr in no Python mode is added. Further, three new environment variables are added, two for overriding CPU target/features and another to warn if parallel=True was set no such transform was possible.

  • PR #2490: Implement np.take and ndarray.take
  • PR #2493: Display a warning if parallel=True is set but not possible.
  • PR #2513: Add np.MachAr, np.finfo, np.iinfo
  • PR #2515: Allow environ overriding of cpu target and cpu features.

Due to expansion of the test farm and a focus on fixing bugs, the following fixes were also made.

Misc fixes/enhancements:

  • PR #2455: add contextual information to runtime errors
  • PR #2470: Fixes #2458, poor performance in np.median
  • PR #2471: Ensure LLVM threadsafety in {g,}ufunc building.
  • PR #2494: Update doc theme
  • PR #2503: Remove hacky code added in 2482 and feature enhancement
  • PR #2505: Serialise env mutation tests during multithreaded testing.
  • PR #2520: Fix failing cpu-target override tests

CUDA support fixes:

  • PR #2504: Enable CUDA toolkit version testing
  • PR #2509: Disable tests generating code unavailable in lower CC versions.
  • PR #2511: Fix Windows 64 bit CUDA tests.

10.14. Version 0.34.0

This release adds a significant set of new features arising from combined work with Intel on ParallelAccelerator technology. It also adds list comprehension and closure support, support for Numpy 1.13 and a new, faster, CUDA reduction algorithm. For Linux users this release is the first to be built on Centos 6, which will be the new base platform for future releases. Finally a number of thread-safety, type inference and other smaller enhancements and bugs have been fixed.

ParallelAccelerator features:

NOTE: The ParallelAccelerator technology is under active development and should be considered experimental.

The ParallelAccelerator technology is accessed via a new “nopython” mode option “parallel”. The ParallelAccelerator technology attempts to identify operations which have parallel semantics (for instance adding a scalar to a vector), fuse together adjacent such operations, and then parallelize their execution across a number of CPU cores. This is essentially auto-parallelization.

In addition to the auto-parallelization feature, explicit loop based parallelism is made available through the use of prange in place of range as a loop iterator.

More information and examples on both auto-parallelization and prange are available in the documentation and examples directory respectively.

As part of the necessary work for ParallelAccelerator, support for closures and list comprehensions is added:

  • PR #2318: Transfer ParallelAccelerator technology to Numba
  • PR #2379: ParallelAccelerator Core Improvements
  • PR #2367: Add support for len(range(…))
  • PR #2369: List comprehension
  • PR #2391: Explicit Parallel Loop Support (prange)

The ParallelAccelerator features are available on all supported platforms and Python versions with the exceptions of (with view of supporting in a future release):

  • The combination of Windows operating systems with Python 2.7.
  • Systems running 32 bit Python.

CUDA support enhancements:

  • PR #2377: New GPU reduction algorithm

CUDA support fixes:

  • PR #2397: Fix #2393, always set alignment of cuda static memory regions

Misc Fixes:

  • PR #2373, Issue #2372: 32-bit compatibility fix for parfor related code
  • PR #2376: Fix #2375 missing stdint.h for py2.7 vc9
  • PR #2378: Fix deadlock in parallel gufunc when kernel acquires the GIL.
  • PR #2382: Forbid unsafe casting in bitwise operation
  • PR #2385: docs: fix Sphinx errors
  • PR #2396: Use 64-bit RHS operand for shift
  • PR #2404: Fix threadsafety logic issue in ufunc compilation cache.
  • PR #2424: Ensure consistent iteration order of blocks for type inference.
  • PR #2425: Guard code to prevent the use of ‘parallel’ on win32 + py27
  • PR #2426: Basic test for Enum member type recovery.
  • PR #2433: Fix up the parfors tests with respect to windows py2.7
  • PR #2442: Skip tests that need BLAS/LAPACK if scipy is not available.
  • PR #2444: Add test for invalid array setitem
  • PR #2449: Make the runtime initialiser threadsafe
  • PR #2452: Skip CFG test on 64bit windows

Misc Enhancements:

  • PR #2366: Improvements to IR utils
  • PR #2388: Update README.rst to indicate the proper version of LLVM
  • PR #2394: Upgrade to llvmlite 0.19.*
  • PR #2395: Update llvmlite version to 0.19
  • PR #2406: Expose environment object to ufuncs
  • PR #2407: Expose environment object to target-context inside lowerer
  • PR #2413: Add flags to pass through to conda build for buildbot
  • PR #2414: Add cross compile flags to local recipe
  • PR #2415: A few cleanups for rewrites
  • PR #2418: Add getitem support for Enum classes
  • PR #2419: Add support for returning enums in vectorize
  • PR #2421: Add copyright notice for Intel contributed files.
  • PR #2422: Patch code base to work with np 1.13 release
  • PR #2448: Adds in warning message when using ‘parallel’ if cache=True
  • PR #2450: Add test for keyword arg on .sum-like and .cumsum-like array methods

10.15. Version 0.33.0

This release resolved several performance issues caused by atomic reference counting operations inside loop bodies. New optimization passes have been added to reduce the impact of these operations. We observe speed improvements between 2x-10x in affected programs due to the removal of unnecessary reference counting operations.

There are also several enhancements to the CUDA GPU support:

  • A GPU random number generator based on xoroshiro128+ algorithm is added. See details and examples in documentation.
  • @cuda.jit CUDA kernels can now call @jit and @njit CPU functions and they will automatically be compiled as CUDA device functions.
  • CUDA IPC memory API is exposed for sharing memory between proceses. See usage details in documentation.

Reference counting enhancements:

  • PR #2346, Issue #2345, #2248: Add extra refcount pruning after inlining
  • PR #2349: Fix refct pruning not removing refct op with tail call.
  • PR #2352, Issue #2350: Add refcount pruning pass for function that does not need refcount

CUDA support enhancements:

  • PR #2023: Supports CUDA IPC for device array
  • PR #2343, Issue #2335: Allow CPU jit decorated function to be used as cuda device function
  • PR #2347: Add random number generator support for CUDA device code
  • PR #2361: Update autotune table for CC: 5.3, 6.0, 6.1, 6.2

Misc fixes:

  • PR #2362: Avoid test failure due to typing to int32 on 32-bit platforms
  • PR #2359: Fixed nogil example that threw a TypeError when executed.
  • PR #2357, Issue #2356: Fix fragile test that depends on how the script is executed.
  • PR #2355: Fix cpu dispatcher referenced as attribute of another module
  • PR #2354: Fixes an issue with caching when function needs NRT and refcount pruning
  • PR #2342, Issue #2339: Add warnings to inspection when it is used on unserialized cached code
  • PR #2329, Issue #2250: Better handling of missing op codes

Misc enhancements:

  • PR #2360: Adds missing values in error mesasge interp.
  • PR #2353: Handle when get_host_cpu_features() raises RuntimeError
  • PR #2351: Enable SVML for erf/erfc/gamma/lgamma/log2
  • PR #2344: Expose error_model setting in jit decorator
  • PR #2337: Align blocking terminate support for fork() with new TBB version
  • PR #2336: Bump llvmlite version to 0.18
  • PR #2330: Core changes in PR #2318

10.16. Version 0.32.0

In this release, we are upgrading to LLVM 4.0. A lot of work has been done to fix many race-condition issues inside LLVM when the compiler is used concurrently, which is likely when Numba is used with Dask.

Improvements:

  • PR #2322: Suppress test error due to unknown but consistent error with tgamma
  • PR #2320: Update llvmlite dependency to 0.17
  • PR #2308: Add details to error message on why cuda support is disabled.
  • PR #2302: Add os x to travis
  • PR #2294: Disable remove_module on MCJIT due to memory leak inside LLVM
  • PR #2291: Split parallel tests and recycle workers to tame memory usage
  • PR #2253: Remove the pointer-stuffing hack for storing meminfos in lists

Fixes:

  • PR #2331: Fix a bug in the GPU array indexing
  • PR #2326: Fix #2321 docs referring to non-existing function.
  • PR #2316: Fixing more race-condition problems
  • PR #2315: Fix #2314. Relax strict type check to allow optional type.
  • PR #2310: Fix race condition due to concurrent compilation and cache loading
  • PR #2304: Fix intrinsic 1st arg not a typing.Context as stated by the docs.
  • PR #2287: Fix int64 atomic min-max
  • PR #2286: Fix #2285 @overload_method not linking dependent libs
  • PR #2303: Missing import statements to interval-example.rst

10.17. Version 0.31.0

In this release, we added preliminary support for debugging with GDB version >= 7.0. The feature is enabled by setting the debug=True compiler option, which causes GDB compatible debug info to be generated. The CUDA backend also gained limited debugging support so that source locations are showed in memory-checking and profiling tools. For details, see Troubleshooting and tips.

Also, we added the fastmath=True compiler option to enable unsafe floating-point transformations, which allows LLVM to auto-vectorize more code.

Other important changes include upgrading to LLVM 3.9.1 and adding support for Numpy 1.12.

Improvements:

  • PR #2281: Update for numpy1.12
  • PR #2278: Add CUDA atomic.{max, min, compare_and_swap}
  • PR #2277: Add about section to conda recipies to identify license and other metadata in Anaconda Cloud
  • PR #2271: Adopt itanium C++-style mangling for CPU and CUDA targets
  • PR #2267: Add fastmath flags
  • PR #2261: Support dtype.type
  • PR #2249: Changes for llvm3.9
  • PR #2234: Bump llvmlite requirement to 0.16 and add install_name_tool_fixer to mviewbuf for OS X
  • PR #2230: Add python3.6 to TravisCi
  • PR #2227: Enable caching for gufunc wrapper
  • PR #2170: Add debugging support
  • PR #2037: inspect_cfg() for easier visualization of the function operation

Fixes:

  • PR #2274: Fix nvvm ir patch in mishandling “load”
  • PR #2272: Fix breakage to cuda7.5
  • PR #2269: Fix caching of copy_strides kernel in cuda.reduce
  • PR #2265: Fix #2263: error when linking two modules with dynamic globals
  • PR #2252: Fix path separator in test
  • PR #2246: Fix overuse of memory in some system with fork
  • PR #2241: Fix #2240: __module__ in dynamically created function not a str
  • PR #2239: Fix fingerprint computation failure preventing fallback

10.18. Version 0.30.1

This is a bug-fix release to enable Python 3.6 support. In addition, there is now early Intel TBB support for parallel ufuncs when building from source with TBBROOT defined. The TBB feature is not enabled in our official builds.

Fixes:

  • PR #2232: Fix name clashes with _Py_hashtable_xxx in Python 3.6.

Improvements:

  • PR #2217: Add Intel TBB threadpool implementation for parallel ufunc.

10.19. Version 0.30.0

This release adds preliminary support for Python 3.6, but no official build is available yet. A new system reporting tool (numba --sysinfo) is added to provide system information to help core developers in replication and debugging. See below for other improvements and bug fixes.

Improvements:

  • PR #2209: Support Python 3.6.
  • PR #2175: Support np.trace(), np.outer() and np.kron().
  • PR #2197: Support np.nanprod().
  • PR #2190: Support caching for ufunc.
  • PR #2186: Add system reporting tool.

Fixes:

  • PR #2214, Issue #2212: Fix memory error with ndenumerate and flat iterators.
  • PR #2206, Issue #2163: Fix zip() consuming extra elements in early exhaustion.
  • PR #2185, Issue #2159, #2169: Fix rewrite pass affecting objmode fallback.
  • PR #2204, Issue #2178: Fix annotation for liftedloop.
  • PR #2203: Fix Appveyor segfault with Python 3.5.
  • PR #2202, Issue #2198: Fix target context not initialized when loading from ufunc cache.
  • PR #2172, Issue #2171: Fix optional type unpacking.
  • PR #2189, Issue #2188: Disable freezing of big (>1MB) global arrays.
  • PR #2180, Issue #2179: Fix invalid variable version in looplifting.
  • PR #2156, Issue #2155: Fix divmod, floordiv segfault on CUDA.

10.20. Version 0.29.0

This release extends the support of recursive functions to include direct and indirect recursion without explicit function type annotations. See new example in examples/mergesort.py. Newly supported numpy features include array stacking functions, np.linalg.eig* functions, np.linalg.matrix_power, np.roots and array to array broadcasting in assignments.

This release depends on llvmlite 0.14.0 and supports CUDA 8 but it is not required.

Improvements:

  • PR #2130, #2137: Add type-inferred recursion with docs and examples.
  • PR #2134: Add np.linalg.matrix_power.
  • PR #2125: Add np.roots.
  • PR #2129: Add np.linalg.{eigvals,eigh,eigvalsh}.
  • PR #2126: Add array-to-array broadcasting.
  • PR #2069: Add hstack and related functions.
  • PR #2128: Allow for vectorizing a jitted function. (thanks to @dhirschfeld)
  • PR #2117: Update examples and make them test-able.
  • PR #2127: Refactor interpreter class and its results.

Fixes:

  • PR #2149: Workaround MSVC9.0 SP1 fmod bug kb982107.
  • PR #2145, Issue #2009: Fixes kwargs for jitclass __init__ method.
  • PR #2150: Fix slowdown in objmode fallback.
  • PR #2050, Issue #1259: Fix liveness problem with some generator loops.
  • PR #2072, Issue #1995: Right shift of unsigned LHS should be logical.
  • PR #2115, Issue #1466: Fix inspect_types() error due to mangled variable name.
  • PR #2119, Issue #2118: Fix array type created from record-dtype.
  • PR #2122, Issue #1808: Fix returning a generator due to datamodel error.

10.21. Version 0.28.1

This is a bug-fix release to resolve packaging issues with setuptools dependency.

10.22. Version 0.28.0

Amongst other improvements, this version improves again the level of support for linear algebra – functions from the numpy.linalg module. Also, our random generator is now guaranteed to be thread-safe and fork-safe.

Improvements:

  • PR #2019: Add the @intrinsic decorator to define low-level subroutines callable from JIT functions (this is considered a private API for now).
  • PR #2059: Implement np.concatenate and np.stack.
  • PR #2048: Make random generation fork-safe and thread-safe, producing independent streams of random numbers for each thread or process.
  • PR #2031: Add documentation of floating-point pitfalls.
  • Issue #2053: Avoid polling in parallel CPU target (fixes severe performance regression on Windows).
  • Issue #2029: Make default arguments fast.
  • PR #2052: Add logging to the CUDA driver.
  • PR #2049: Implement the built-in divmod() function.
  • PR #2036: Implement the argsort() method on arrays.
  • PR #2046: Improving CUDA memory management by deferring deallocations until certain thresholds are reached, so as to avoid breaking asynchronous execution.
  • PR #2040: Switch the CUDA driver implementation to use CUDA’s “primary context” API.
  • PR #2017: Allow min(tuple) and max(tuple).
  • PR #2039: Reduce fork() detection overhead in CUDA.
  • PR #2021: Handle structured dtypes with titles.
  • PR #1996: Rewrite looplifting as a transformation on Numba IR.
  • PR #2014: Implement np.linalg.matrix_rank.
  • PR #2012: Implement np.linalg.cond.
  • PR #1985: Rewrite even trivial array expressions, which opens the door for other optimizations (for example, array ** 2 can be converted into array * array).
  • PR #1950: Have typeof() always raise ValueError on failure. Previously, it would either raise or return None, depending on the input.
  • PR #1994: Implement np.linalg.norm.
  • PR #1987: Implement np.linalg.det and np.linalg.slogdet.
  • Issue #1979: Document integer width inference and how to workaround.
  • PR #1938: Numba is now compatible with LLVM 3.8.
  • PR #1967: Restrict np.linalg functions to homogeneous dtypes. Users wanting to pass mixed-typed inputs have to convert explicitly, which makes the performance implications more obvious.

Fixes:

  • PR #2006: array(float32) ** int should return array(float32).
  • PR #2044: Allow reshaping empty arrays.
  • Issue #2051: Fix refcounting issue when concatenating tuples.
  • Issue #2000: Make Numpy optional for setup.py, to allow pip install to work without Numpy pre-installed.
  • PR #1989: Fix assertion in Dispatcher.disable_compile().
  • Issue #2028: Ignore filesystem errors when caching from multiple processes.
  • Issue #2003: Allow unicode variable and function names (on Python 3).
  • Issue #1998: Fix deadlock in parallel ufuncs that reacquire the GIL.
  • PR #1997: Fix random crashes when AOT compiling on certain Windows platforms.
  • Issue #1988: Propagate jitclass docstring.
  • Issue #1933: Ensure array constants are emitted with the right alignment.

10.23. Version 0.27.0

Improvements:

  • Issue #1976: improve error message when non-integral dimensions are given to a CUDA kernel.
  • PR #1970: Optimize the power operator with a static exponent.
  • PR #1710: Improve contextual information for compiler errors.
  • PR #1961: Support printing constant strings.
  • PR #1959: Support more types in the print() function.
  • PR #1823: Support compute_50 in CUDA backend.
  • PR #1955: Support np.linalg.pinv.
  • PR #1896: Improve the SmartArray API.
  • PR #1947: Support np.linalg.solve.
  • Issue #1943: Improve error message when an argument fails typing.4
  • PR #1927: Support np.linalg.lstsq.
  • PR #1934: Use system functions for hypot() where possible, instead of our own implementation.
  • PR #1929: Add cffi support to @cfunc objects.
  • PR #1932: Add user-controllable thread pool limits for parallel CPU target.
  • PR #1928: Support self-recursion when the signature is explicit.
  • PR #1890: List all lowering implementations in the developer docs.
  • Issue #1884: Support np.lib.stride_tricks.as_strided().

Fixes:

  • Issue #1960: Fix sliced assignment when source and destination areas are overlapping.
  • PR #1963: Make CUDA print() atomic.
  • PR #1956: Allow 0d array constants.
  • Issue #1945: Allow using Numpy ufuncs in AOT compiled code.
  • Issue #1916: Fix documentation example for @generated_jit.
  • Issue #1926: Fix regression when caching functions in an IPython session.
  • Issue #1923: Allow non-intp integer arguments to carray() and farray().
  • Issue #1908: Accept non-ASCII unicode docstrings on Python 2.
  • Issue #1874: Allow del container[key] in object mode.
  • Issue #1913: Fix set insertion bug when the lookup chain contains deleted entries.
  • Issue #1911: Allow function annotations on jitclass methods.

10.24. Version 0.26.0

This release adds support for cfunc decorator for exporting numba jitted functions to 3rd party API that takes C callbacks. Most of the overhead of using jitclasses inside the interpreter are eliminated. Support for decompositions in numpy.linalg are added. Finally, Numpy 1.11 is supported.

Improvements:

  • PR #1889: Export BLAS and LAPACK wrappers for pycc.
  • PR #1888: Faster array power.
  • Issue #1867: Allow “out” keyword arg for dufuncs.
  • PR #1871: carray() and farray() for creating arrays from pointers.
  • PR #1855: @cfunc decorator for exporting as ctypes function.
  • PR #1862: Add support for numpy.linalg.qr.
  • PR #1851: jitclass support for ‘_’ and ‘__’ prefixed attributes.
  • PR #1842: Optimize jitclass in Python interpreter.
  • Issue #1837: Fix CUDA simulator issues with device function.
  • PR #1839: Add support for decompositions from numpy.linalg.
  • PR #1829: Support Python enums.
  • PR #1828: Add support for numpy.random.rand()` and numpy.random.randn()
  • Issue #1825: Use of 0-darray in place of scalar index.
  • Issue #1824: Scalar arguments to object mode gufuncs.
  • Issue #1813: Let bitwise bool operators return booleans, not integers.
  • Issue #1760: Optional arguments in generators.
  • PR #1780: Numpy 1.11 support.

10.25. Version 0.25.0

This release adds support for set objects in nopython mode. It also adds support for many missing Numpy features and functions. It improves Numba’s compatibility and performance when using a distributed execution framework such as dask, distributed or Spark. Finally, it removes compatibility with Python 2.6, Python 3.3 and Numpy 1.6.

Improvements:

  • Issue #1800: Add erf(), erfc(), gamma() and lgamma() to CUDA targets.
  • PR #1793: Implement more Numpy functions: np.bincount(), np.diff(), np.digitize(), np.histogram(), np.searchsorted() as well as NaN-aware reduction functions (np.nansum(), np.nanmedian(), etc.)
  • PR #1789: Optimize some reduction functions such as np.sum(), np.prod(), np.median(), etc.
  • PR #1752: Make CUDA features work in dask, distributed and Spark.
  • PR #1787: Support np.nditer() for fast multi-array indexing with broadcasting.
  • PR #1799: Report JIT-compiled functions as regular Python functions when profiling (allowing to see the filename and line number where a function is defined).
  • PR #1782: Support np.any() and np.all().
  • Issue #1788: Support the iter() and next() built-in functions.
  • PR #1778: Support array.astype().
  • Issue #1775: Allow the user to set the target CPU model for AOT compilation.
  • PR #1758: Support creating random arrays using the size parameter to the np.random APIs.
  • PR #1757: Support len() on array.flat objects.
  • PR #1749: Remove Numpy 1.6 compatibility.
  • PR #1748: Remove Python 2.6 and 3.3 compatibility.
  • PR #1735: Support the not in operator as well as operator.contains().
  • PR #1724: Support homogeneous sets in nopython mode.
  • Issue #875: make compilation of array constants faster.

Fixes:

  • PR #1795: Fix a massive performance issue when calling Numba functions with distributed, Spark or a similar mechanism using serialization.
  • Issue #1784: Make jitclasses usable with NUMBA_DISABLE_JIT=1.
  • Issue #1786: Allow using linear algebra functions when profiling.
  • Issue #1796: Fix np.dot() memory leak on non-contiguous inputs.
  • PR #1792: Fix static negative indexing of tuples.
  • Issue #1771: Use fallback cache directory when __pycache__ isn’t writable, such as when user code is installed in a system location.
  • Issue #1223: Use Numpy error model in array expressions (e.g. division by zero returns inf or nan instead of raising an error).
  • Issue #1640: Fix np.random.binomial() for large n values.
  • Issue #1643: Improve error reporting when passing an invalid spec to @jitclass.
  • PR #1756: Fix slicing with a negative step and an omitted start.

10.26. Version 0.24.0

This release introduces several major changes, including the @generated_jit decorator for flexible specializations as with Julia’s “@generated” macro, or the SmartArray array wrapper type that allows seamless transfer of array data between the CPU and the GPU.

This will be the last version to support Python 2.6, Python 3.3 and Numpy 1.6.

Improvements:

  • PR #1723: Improve compatibility of JIT functions with the Python profiler.
  • PR #1509: Support array.ravel() and array.flatten().
  • PR #1676: Add SmartArray type to support transparent data management in multiple address spaces (host & GPU).
  • PR #1689: Reduce startup overhead of importing Numba.
  • PR #1705: Support registration of CFFI types as corresponding to known Numba types.
  • PR #1686: Document the extension API.
  • PR #1698: Improve warnings raised during type inference.
  • PR #1697: Support np.dot() and friends on non-contiguous arrays.
  • PR #1692: cffi.from_buffer() improvements (allow more pointer types, allow non-Numpy buffer objects).
  • PR #1648: Add the @generated_jit decorator.
  • PR #1651: Implementation of np.linalg.inv using LAPACK. Thanks to Matthieu Dartiailh.
  • PR #1674: Support np.diag().
  • PR #1673: Improve error message when looking up an attribute on an unknown global.
  • Issue #1569: Implement runtime check for the LLVM locale bug.
  • PR #1612: Switch to LLVM 3.7 in sync with llvmlite.
  • PR #1624: Allow slice assignment of sequence to array.
  • PR #1622: Support slicing tuples with a constant slice.

Fixes:

  • Issue #1722: Fix returning an optional boolean (bool or None).
  • Issue #1734: NRT decref bug when variable is del’ed before being defined, leading to a possible memory leak.
  • PR #1732: Fix tuple getitem regression for CUDA target.
  • PR #1718: Mishandling of optional to optional casting.
  • PR #1714: Fix .compile() on a JIT function not respecting ._can_compile.
  • Issue #1667: Fix np.angle() on arrays.
  • Issue #1690: Fix slicing with an omitted stop and a negative step value.
  • PR #1693: Fix gufunc bug in handling scalar formal arg with non-scalar input value.
  • PR #1683: Fix parallel testing under Windows.
  • Issue #1616: Use system-provided versions of C99 math where possible.
  • Issue #1652: Reductions of bool arrays (e.g. sum() or mean()) should return integers or floats, not bools.
  • Issue #1664: Fix regression when indexing a record array with a constant index.
  • PR #1661: Disable AVX on old Linux kernels.
  • Issue #1636: Allow raising an exception looked up on a module.

10.27. Version 0.23.1

This is a bug-fix release to address several regressions introduced in the 0.23.0 release, and a couple other issues.

Fixes:

  • Issue #1645: CUDA ufuncs were broken in 0.23.0.
  • Issue #1638: Check tuple sizes when passing a list of tuples.
  • Issue #1630: Parallel ufunc would keep eating CPU even after finishing under Windows.
  • Issue #1628: Fix ctypes and cffi tests under Windows with Python 3.5.
  • Issue #1627: Fix xrange() support.
  • PR #1611: Rewrite variable liveness analysis.
  • Issue #1610: Allow nested calls between explicitly-typed ufuncs.
  • Issue #1593: Fix *args in object mode.

10.28. Version 0.23.0

This release introduces JIT classes using the new @jitclass decorator, allowing user-defined structures for nopython mode. Other improvements and bug fixes are listed below.

Improvements:

  • PR #1609: Speed up some simple math functions by inlining them in their caller
  • PR #1571: Implement JIT classes
  • PR #1584: Improve typing of array indexing
  • PR #1583: Allow printing booleans
  • PR #1542: Allow negative values in np.reshape()
  • PR #1560: Support vector and matrix dot product, including np.dot() and the @ operator in Python 3.5
  • PR #1546: Support field lookup on record arrays and scalars (i.e. array['field'] in addition to array.field)
  • PR #1440: Support the HSA wavebarrier() and activelanepermute_wavewidth() intrinsics
  • PR #1540: Support np.angle()
  • PR #1543: Implement CPU multithreaded gufuncs (target=”parallel”)
  • PR #1551: Allow scalar arguments in np.where(), np.empty_like().
  • PR #1516: Add some more examples from NumbaPro
  • PR #1517: Support np.sinc()

Fixes:

  • Issue #1603: Fix calling a non-cached function from a cached function
  • Issue #1594: Ensure a list is homogeneous when unboxing
  • Issue #1595: Replace deprecated use of get_pointer_to_function()
  • Issue #1586: Allow tests to be run by different users on the same machine
  • Issue #1587: Make CudaAPIError picklable
  • Issue #1568: Fix using Numba from inside Visual Studio 2015
  • Issue #1559: Fix serializing a jit function referring a renamed module
  • PR #1508: Let reshape() accept integer argument(s), not just a tuple
  • Issue #1545: Improve error checking when unboxing list objects
  • Issue #1538: Fix array broadcasting in CUDA gufuncs
  • Issue #1526: Fix a reference count handling bug

10.29. Version 0.22.1

This is a bug-fix release to resolve some packaging issues and other problems found in the 0.22.0 release.

Fixes:

  • PR #1515: Include MANIFEST.in in MANIFEST.in so that sdist still works from source tar files.
  • PR #1518: Fix reference counting bug caused by hidden alias
  • PR #1519: Fix erroneous assert when passing nopython=True to guvectorize.
  • PR #1521: Fix cuda.test()

10.30. Version 0.22.0

This release features several highlights: Python 3.5 support, Numpy 1.10 support, Ahead-of-Time compilation of extension modules, additional vectorization features that were previously only available with the proprietary extension NumbaPro, improvements in array indexing.

Improvements:

  • PR #1497: Allow scalar input type instead of size-1 array to @guvectorize
  • PR #1480: Add distutils support for AOT compilation
  • PR #1460: Create a new API for Ahead-of-Time (AOT) compilation
  • PR #1451: Allow passing Python lists to JIT-compiled functions, and reflect mutations on function return
  • PR #1387: Numpy 1.10 support
  • PR #1464: Support cffi.FFI.from_buffer()
  • PR #1437: Propagate errors raised from Numba-compiled ufuncs; also, let “division by zero” and other math errors produce a warning instead of exiting the function early
  • PR #1445: Support a subset of fancy indexing
  • PR #1454: Support “out-of-line” CFFI modules
  • PR #1442: Improve array indexing to support more kinds of basic slicing
  • PR #1409: Support explicit CUDA memory fences
  • PR #1435: Add support for vectorize() and guvectorize() with HSA
  • PR #1432: Implement numpy.nonzero() and numpy.where()
  • PR #1416: Add support for vectorize() and guvectorize() with CUDA, as originally provided in NumbaPro
  • PR #1424: Support in-place array operators
  • PR #1414: Python 3.5 support
  • PR #1404: Add the parallel ufunc functionality originally provided in NumbaPro
  • PR #1393: Implement sorting on arrays and lists
  • PR #1415: Add functions to estimate the occupancy of a CUDA kernel
  • PR #1360: The JIT cache now stores the compiled object code, yielding even larger speedups.
  • PR #1402: Fixes for the ARMv7 (armv7l) architecture under Linux
  • PR #1400: Add the cuda.reduce() decorator originally provided in NumbaPro

Fixes:

  • PR #1483: Allow np.empty_like() and friends on non-contiguous arrays
  • Issue #1471: Allow caching JIT functions defined in IPython
  • PR #1457: Fix flat indexing of boolean arrays
  • PR #1421: Allow calling Numpy ufuncs, without an explicit output, on non-contiguous arrays
  • Issue #1411: Fix crash when unpacking a tuple containing a Numba-allocated array
  • Issue #1394: Allow unifying range_state32 and range_state64
  • Issue #1373: Fix code generation error on lists of bools

10.31. Version 0.21.0

This release introduces support for AMD’s Heterogeneous System Architecture, which allows memory to be shared directly between the CPU and the GPU. Other major enhancements are support for lists and the introduction of an opt-in compilation cache.

Improvements:

  • PR #1391: Implement print() for CUDA code
  • PR #1366: Implement integer typing enhancement proposal (NBEP 1)
  • PR #1380: Support the one-argument type() builtin
  • PR #1375: Allow boolean evaluation of lists and tuples
  • PR #1371: Support array.view() in CUDA mode
  • PR #1369: Support named tuples in nopython mode
  • PR #1250: Implement numpy.median().
  • PR #1289: Make dispatching faster when calling a JIT-compiled function from regular Python
  • Issue #1226: Improve performance of integer power
  • PR #1321: Document features supported with CUDA
  • PR #1345: HSA support
  • PR #1343: Support lists in nopython mode
  • PR #1356: Make Numba-allocated memory visible to tracemalloc
  • PR #1363: Add an environment variable NUMBA_DEBUG_TYPEINFER
  • PR #1051: Add an opt-in, per-function compilation cache

Fixes:

  • Issue #1372: Some array expressions would fail rewriting when involved the same variable more than once, or a unary operator
  • Issue #1385: Allow CUDA local arrays to be declared anywhere in a function
  • Issue #1285: Support datetime64 and timedelta64 in Numpy reduction functions
  • Issue #1332: Handle the EXTENDED_ARG opcode.
  • PR #1329: Handle the in operator in object mode
  • Issue #1322: Fix augmented slice assignment on Python 2
  • PR #1357: Fix slicing with some negative bounds or step values.

10.32. Version 0.20.0

This release updates Numba to use LLVM 3.6 and CUDA 7 for CUDA support. Following the platform deprecation in CUDA 7, Numba’s CUDA feature is no longer supported on 32-bit platforms. The oldest supported version of Windows is Windows 7.

Improvements:

  • Issue #1203: Support indexing ndarray.flat
  • PR #1200: Migrate cgutils to llvmlite
  • PR #1190: Support more array methods: .transpose(), .T, .copy(), .reshape(), .view()
  • PR #1214: Simplify setup.py and avoid manual maintenance
  • PR #1217: Support datetime64 and timedelta64 constants
  • PR #1236: Reload environment variables when compiling
  • PR #1225: Various speed improvements in generated code
  • PR #1252: Support cmath module in CUDA
  • PR #1238: Use 32-byte aligned allocator to optimize for AVX
  • PR #1258: Support numpy.frombuffer()
  • PR #1274: Use TravisCI container infrastructure for lower wait time
  • PR #1279: Micro-optimize overload resolution in call dispatch
  • Issue #1248: Improve error message when return type unification fails

Fixes:

  • Issue #1131: Handling of negative zeros in np.conjugate() and np.arccos()
  • Issue #1188: Fix slow array return
  • Issue #1164: Avoid warnings from CUDA context at shutdown
  • Issue #1229: Respect the writeable flag in arrays
  • Issue #1244: Fix bug in refcount pruning pass
  • Issue #1251: Fix partial left-indexing of Fortran contiguous array
  • Issue #1264: Fix compilation error in array expression
  • Issue #1254: Fix error when yielding array objects
  • Issue #1276: Fix nested generator use

10.33. Version 0.19.2

This release fixes the source distribution on pypi. The only change is in the setup.py file. We do not plan to provide a conda package as this release is essentially the same as 0.19.1 for conda users.

10.34. Version 0.19.1

  • Issue #1196:
    • fix double-free segfault due to redundant variable deletion in the Numba IR (#1195)
    • fix use-after-delete in array expression rewrite pass

10.35. Version 0.19.0

This version introduces memory management in the Numba runtime, allowing to allocate new arrays inside Numba-compiled functions. There is also a rework of the ufunc infrastructure, and an optimization pass to collapse cascading array operations into a single efficient loop.

Warning

Support for Windows XP and Vista with all compiler targets and support for 32-bit platforms (Win/Mac/Linux) with the CUDA compiler target are deprecated. In the next release of Numba, the oldest version of Windows supported will be Windows 7. CPU compilation will remain supported on 32-bit Linux and Windows platforms.

Known issues:

  • There are some performance regressions in very short running nopython functions due to the additional overhead incurred by memory management. We will work to reduce this overhead in future releases.

Features:

  • Issue #1181: Add a Frequently Asked Questions section to the documentation.
  • Issue #1162: Support the cumsum() and cumprod() methods on Numpy arrays.
  • Issue #1152: Support the *args argument-passing style.
  • Issue #1147: Allow passing character sequences as arguments to JIT-compiled functions.
  • Issue #1110: Shortcut deforestation and loop fusion for array expressions.
  • Issue #1136: Support various Numpy array constructors, for example numpy.zeros() and numpy.zeros_like().
  • Issue #1127: Add a CUDA simulator running on the CPU, enabled with the NUMBA_ENABLE_CUDASIM environment variable.
  • Issue #1086: Allow calling standard Numpy ufuncs without an explicit output array from nopython functions.
  • Issue #1113: Support keyword arguments when calling numpy.empty() and related functions.
  • Issue #1108: Support the ctypes.data attribute of Numpy arrays.
  • Issue #1077: Memory management for array allocations in nopython mode.
  • Issue #1105: Support calling a ctypes function that takes ctypes.py_object parameters.
  • Issue #1084: Environment variable NUMBA_DISABLE_JIT disables compilation of @jit functions, instead calling into the Python interpreter when called. This allows easier debugging of multiple jitted functions.
  • Issue #927: Allow gufuncs with no output array.
  • Issue #1097: Support comparisons between tuples.
  • Issue #1075: Numba-generated ufuncs can now be called from nopython functions.
  • Issue #1062: @vectorize now allows omitting the signatures, and will compile the required specializations on the fly (like @jit does).
  • Issue #1027: Support numpy.round().
  • Issue #1085: Allow returning a character sequence (as fetched from a structured array) from a JIT-compiled function.

Fixes:

  • Issue #1170: Ensure ndindex(), ndenumerate() and ndarray.flat work properly inside generators.
  • Issue #1151: Disallow unpacking of tuples with the wrong size.
  • Issue #1141: Specify install dependencies in setup.py.
  • Issue #1106: Loop-lifting would fail when the lifted loop does not produce any output values for the function tail.
  • Issue #1103: Fix mishandling of some inputs when a JIT-compiled function is called with multiple array layouts.
  • Issue #1089: Fix range() with large unsigned integers.
  • Issue #1088: Install entry-point scripts (numba, pycc) from the conda build recipe.
  • Issue #1081: Constant structured scalars now work properly.
  • Issue #1080: Fix automatic promotion of booleans to integers.

10.36. Version 0.18.2

Bug fixes:

  • Issue #1073: Fixes missing template file for HTML annotation
  • Issue #1074: Fixes CUDA support on Windows machine due to NVVM API mismatch

10.37. Version 0.18.1

Version 0.18.0 is not officially released.

This version removes the old deprecated and undocumented argtypes and restype arguments to the @jit decorator. Function signatures should always be passed as the first argument to @jit.

Features:

  • Issue #960: Add inspect_llvm() and inspect_asm() methods to JIT-compiled functions: they output the LLVM IR and the native assembler source of the compiled function, respectively.
  • Issue #990: Allow passing tuples as arguments to JIT-compiled functions in nopython mode.
  • Issue #774: Support two-argument round() in nopython mode.
  • Issue #987: Support missing functions from the math module in nopython mode: frexp(), ldexp(), gamma(), lgamma(), erf(), erfc().
  • Issue #995: Improve code generation for round() on Python 3.
  • Issue #981: Support functions from the random and numpy.random modules in nopython mode.
  • Issue #979: Add cuda.atomic.max().
  • Issue #1006: Improve exception raising and reporting. It is now allowed to raise an exception with an error message in nopython mode.
  • Issue #821: Allow ctypes- and cffi-defined functions as arguments to nopython functions.
  • Issue #901: Allow multiple explicit signatures with @jit. The signatures must be passed in a list, as with @vectorize.
  • Issue #884: Better error message when a JIT-compiled function is called with the wrong types.
  • Issue #1010: Simpler and faster CUDA argument marshalling thanks to a refactoring of the data model.
  • Issue #1018: Support arrays of scalars inside Numpy structured types.
  • Issue #808: Reduce Numba import time by half.
  • Issue #1021: Support the buffer protocol in nopython mode. Buffer-providing objects, such as bytearray, array.array or memoryview support array-like operations such as indexing and iterating. Furthermore, some standard attributes on the memoryview object are supported.
  • Issue #1030: Support nested arrays in Numpy structured arrays.
  • Issue #1033: Implement the inspect_types(), inspect_llvm() and inspect_asm() methods for CUDA kernels.
  • Issue #1029: Support Numpy structured arrays with CUDA as well.
  • Issue #1034: Support for generators in nopython and object mode.
  • Issue #1044: Support default argument values when calling Numba-compiled functions.
  • Issue #1048: Allow calling Numpy scalar constructors from CUDA functions.
  • Issue #1047: Allow indexing a multi-dimensional array with a single integer, to take a view.
  • Issue #1050: Support len() on tuples.
  • Issue #1011: Revive HTML annotation.

Fixes:

  • Issue #977: Assignment optimization was too aggressive.
  • Issue #561: One-argument round() now returns an int on Python 3.
  • Issue #1001: Fix an unlikely bug where two closures with the same name and id() would compile to the same LLVM function name, despite different closure values.
  • Issue #1006: Fix reference leak when a JIT-compiled function is disposed of.
  • Issue #1017: Update instructions for CUDA in the README.
  • Issue #1008: Generate shorter LLVM type names to avoid segfaults with CUDA.
  • Issue #1005: Properly clean up references when raising an exception from object mode.
  • Issue #1041: Fix incompatibility between Numba and the third-party library “future”.
  • Issue #1053: Fix the size attribute of CUDA shared arrays.

10.38. Version 0.17.0

The major focus in this release has been a rewrite of the documentation. The new documentation is better structured and has more detailed coverage of Numba features and APIs. It can be found online at http://numba.pydata.org/numba-doc/dev/index.html

Features:

  • Issue #895: LLVM can now inline nested function calls in nopython mode.
  • Issue #863: CUDA kernels can now infer the types of their arguments (“autojit”-like).
  • Issue #833: Support numpy.{min,max,argmin,argmax,sum,mean,var,std} in nopython mode.
  • Issue #905: Add a nogil argument to the @jit decorator, to release the GIL in nopython mode.
  • Issue #829: Add a identity argument to @vectorize and @guvectorize, to set the identity value of the ufunc.
  • Issue #843: Allow indexing 0-d arrays with the empty tuple.
  • Issue #933: Allow named arguments, not only positional arguments, when calling a Numba-compiled function.
  • Issue #902: Support numpy.ndenumerate() in nopython mode.
  • Issue #950: AVX is now enabled by default except on Sandy Bridge and Ivy Bridge CPUs, where it can produce slower code than SSE.
  • Issue #956: Support constant arrays of structured type.
  • Issue #959: Indexing arrays with floating-point numbers isn’t allowed anymore.
  • Issue #955: Add support for 3D CUDA grids and thread blocks.
  • Issue #902: Support numpy.ndindex() in nopython mode.
  • Issue #951: Numpy number types (numpy.int8, etc.) can be used as constructors for type conversion in nopython mode.

Fixes:

  • Issue #889: Fix NUMBA_DUMP_ASSEMBLY for the CUDA backend.
  • Issue #903: Fix calling of stdcall functions with ctypes under Windows.
  • Issue #908: Allow lazy-compiling from several threads at once.
  • Issue #868: Wrong error message when multiplying a scalar by a non-scalar.
  • Issue #917: Allow vectorizing with datetime64 and timedelta64 in the signature (only with unit-less values, though, because of a Numpy limitation).
  • Issue #431: Allow overloading of cuda device function.
  • Issue #917: Print out errors occurred in object mode ufuncs.
  • Issue #923: Numba-compiled ufuncs now inherit the name and doc of the original Python function.
  • Issue #928: Fix boolean return value in nested calls.
  • Issue #915: @jit called with an explicit signature with a mismatching type of arguments now raises an error.
  • Issue #784: Fix the truth value of NaNs.
  • Issue #953: Fix using shared memory in more than one function (kernel or device).
  • Issue #970: Fix an uncommon double to uint64 conversion bug on CentOS5 32-bit (C compiler issue).

10.39. Version 0.16.0

This release contains a major refactor to switch from llvmpy to llvmlite as our code generation backend. The switch is necessary to reconcile different compiler requirements for LLVM 3.5 (needs C++11) and Python extensions (need specific compiler versions on Windows). As a bonus, we have found the use of llvmlite speeds up compilation by a factor of 2!

Other Major Changes:

  • Faster dispatch for numpy structured arrays
  • Optimized array.flat()
  • Improved CPU feature selection
  • Fix constant tuple regression in macro expansion code

Known Issues:

  • AVX code generation is still disabled by default due to performance regressions when operating on misaligned NumPy arrays. We hope to have a workaround in the future.
  • In extremely rare circumstances, a known issue with LLVM 3.5 code generation can cause an ELF relocation error on 64-bit Linux systems.

10.40. Version 0.15.1

(This was a bug-fix release that superceded version 0.15 before it was announced.)

Fixes:

  • Workaround for missing __ftol2 on Windows XP.
  • Do not lift loops for compilation that contain break statements.
  • Fix a bug in loop-lifting when multiple values need to be returned to the enclosing scope.
  • Handle the loop-lifting case where an accumulator needs to be updated when the loop count is zero.

10.41. Version 0.15

Features:

  • Support for the Python cmath module. (NumPy complex functions were already supported.)
  • Support for .real, .imag, and .conjugate()` on non-complex numbers.
  • Add support for math.isfinite() and math.copysign().
  • Compatibility mode: If enabled (off by default), a failure to compile in object mode will fall back to using the pure Python implementation of the function.
  • Experimental support for serializing JIT functions with cloudpickle.
  • Loop-jitting in object mode now works with loops that modify scalars that are accessed after the loop, such as accumulators.
  • @vectorize functions can be compiled in object mode.
  • Numba can now be built using the Visual C++ Compiler for Python 2.7 on Windows platforms.
  • CUDA JIT functions can be returned by factory functions with variables in the closure frozen as constants.
  • Support for “optional” types in nopython mode, which allow None to be a valid value.

Fixes:

  • If nopython mode compilation fails for any reason, automatically fall back to object mode (unless nopython=True is passed to @jit) rather than raise an exeception.
  • Allow function objects to be returned from a function compiled in object mode.
  • Fix a linking problem that caused slower platform math functions (such as exp()) to be used on Windows, leading to performance regressions against NumPy.
  • min() and max() no longer accept scalars arguments in nopython mode.
  • Fix handling of ambigous type promotion among several compiled versions of a JIT function. The dispatcher will now compile a new version to resolve the problem. (issue #776)
  • Fix float32 to uint64 casting bug on 32-bit Linux.
  • Fix type inference to allow forced casting of return types.
  • Allow the shape of a 1D cuda.shared.array and cuda.local.array to be a one-element tuple.
  • More correct handling of signed zeros.
  • Add custom implementation of atan2() on Windows to handle special cases properly.
  • Eliminated race condition in the handling of the pagelocked staging area used when transferring CUDA arrays.
  • Fix non-deterministic type unification leading to varying performance. (issue #797)

10.42. Version 0.14

Features:

  • Support for nearly all the Numpy math functions (including comparison, logical, bitwise and some previously missing float functions) in nopython mode.
  • The Numpy datetime64 and timedelta64 dtypes are supported in nopython mode with Numpy 1.7 and later.
  • Support for Numpy math functions on complex numbers in nopython mode.
  • ndarray.sum() is supported in nopython mode.
  • Better error messages when unsupported types are used in Numpy math functions.
  • Set NUMBA_WARNINGS=1 in the environment to see which functions are compiled in object mode vs. nopython mode.
  • Add support for the two-argument pow() builtin function in nopython mode.
  • New developer documentation describing how Numba works, and how to add new types.
  • Support for Numpy record arrays on the GPU. (Note: Improper alignment of dtype fields will cause an exception to be raised.)
  • Slices on GPU device arrays.
  • GPU objects can be used as Python context managers to select the active device in a block.
  • GPU device arrays can be bound to a CUDA stream. All subsequent operations (such as memory copies) will be queued on that stream instead of the default. This can prevent unnecessary synchronization with other streams.

Fixes:

  • Generation of AVX instructions has been disabled to avoid performance bugs when calling external math functions that may use SSE instructions, especially on OS X.
  • JIT functions can be removed by the garbage collector when they are no longer accessible.
  • Various other reference counting fixes to prevent memory leaks.
  • Fixed handling of exception when input argument is out of range.
  • Prevent autojit functions from making unsafe numeric conversions when called with different numeric types.
  • Fix a compilation error when an unhashable global value is accessed.
  • Gracefully handle failure to enable faulthandler in the IPython Notebook.
  • Fix a bug that caused loop lifting to fail if the loop was inside an else block.
  • Fixed a problem with selecting CUDA devices in multithreaded programs on Linux.
  • The pow() function (and ** operation) applied to two integers now returns an integer rather than a float.
  • Numpy arrays using the object dtype no longer cause an exception in the autojit.
  • Attempts to write to a global array will cause compilation to fall back to object mode, rather than attempt and fail at nopython mode.
  • range() works with all negative arguments (ex: range(-10, -12, -1))

10.43. Version 0.13.4

Features:

  • Setting and deleting attributes in object mode
  • Added documentation of supported and currently unsupported numpy ufuncs
  • Assignment to 1-D numpy array slices
  • Closure variables and functions can be used in object mode
  • All numeric global values in modules can be used as constants in JIT compiled code
  • Support for the start argument in enumerate()
  • Inplace arithmetic operations (+=, -=, etc.)
  • Direct iteration over a 1D numpy array (e.g. “for x in array: …”) in nopython mode

Fixes:

  • Support for NVIDIA compute capability 5.0 devices (such as the GTX 750)
  • Vectorize no longer crashes/gives an error when bool_ is used as return type
  • Return the correct dictionary when globals() is used in JIT functions
  • Fix crash bug when creating dictionary literals in object
  • Report more informative error message on import if llvmpy is too old
  • Temporarily disable pycc –header, which generates incorrect function signatures.

10.44. Version 0.13.3

Features:

  • Support for enumerate() and zip() in nopython mode
  • Increased LLVM optimization of JIT functions to -O1, enabling automatic vectorization of compiled code in some cases
  • Iteration over tuples and unpacking of tuples in nopython mode
  • Support for dict and set (Python >= 2.7) literals in object mode

Fixes:

  • JIT functions have the same __name__ and __doc__ as the original function.
  • Numerous improvements to better match the data types and behavior of Python math functions in JIT compiled code on different platforms.
  • Importing Numba will no longer throw an exception if the CUDA driver is present, but cannot be initialized.
  • guvectorize now properly supports functions with scalar arguments.
  • CUDA driver is lazily initialized

10.45. Version 0.13.2

Features:

  • @vectorize ufunc now can generate SIMD fast path for unit strided array
  • Added cuda.gridsize
  • Added preliminary exception handling (raise exception class)

Fixes:

  • UNARY_POSITIVE
  • Handling of closures and dynamically generated functions
  • Global None value

10.46. Version 0.13.1

Features:

  • Initial support for CUDA array slicing

Fixes:

  • Indirectly fixes numbapro when the system has a incompatible CUDA driver
  • Fix numba.cuda.detect
  • Export numba.intp and numba.intc

10.47. Version 0.13

Features:

  • Opensourcing NumbaPro CUDA python support in numba.cuda
  • Add support for ufunc array broadcasting
  • Add support for mixed input types for ufuncs
  • Add support for returning tuple from jitted function

Fixes:

  • Fix store slice bytecode handling for Python2
  • Fix inplace subtract
  • Fix pycc so that correct header is emitted
  • Allow vectorize to work on functions with jit decorator

10.48. Version 0.12.2

Fixes:

  • Improved NumPy ufunc support in nopython mode
  • Misc bug fixes

10.49. Version 0.12.1

This version fixed many regressions reported by user for the 0.12 release. This release contains a new loop-lifting mechanism that specializes certains loop patterns for nopython mode compilation. This avoid direct support for heap-allocating and other very dynamic operations.

Improvements:

  • Add loop-lifting–jit-ing loops in nopython for object mode code. This allows functions to allocate NumPy arrays and use Python objects, while the tight loops in the function can still be compiled in nopython mode. Any arrays that the tight loop uses should be created before the loop is entered.

Fixes:

  • Add support for majority of “math” module functions
  • Fix for…else handling
  • Add support for builtin round()
  • Fix tenary if…else support
  • Revive “numba” script
  • Fix problems with some boolean expressions
  • Add support for more NumPy ufuncs

10.50. Version 0.12

Version 0.12 contains a big refactor of the compiler. The main objective for this refactor was to simplify the code base to create a better foundation for further work. A secondary objective was to improve the worst case performance to ensure that compiled functions in object mode never run slower than pure Python code (this was a problem in several cases with the old code base). This refactor is still a work in progress and further testing is needed.

Main improvements:

  • Major refactor of compiler for performance and maintenance reasons
  • Better fallback to object mode when native mode fails
  • Improved worst case performance in object mode

The public interface of numba has been slightly changed. The idea is to make it cleaner and more rational:

  • jit decorator has been modified, so that it can be called without a signature. When called without a signature, it behaves as the old autojit. Autojit has been deprecated in favour of this approach.
  • Jitted functions can now be overloaded.
  • Added a “njit” decorator that behaves like “jit” decorator with nopython=True.
  • The numba.vectorize namespace is gone. The vectorize decorator will be in the main numba namespace.
  • Added a guvectorize decorator in the main numba namespace. It is similiar to numba.vectorize, but takes a dimension signature. It generates gufuncs. This is a replacement for the GUVectorize gufunc factory which has been deprecated.

Main regressions (will be fixed in a future release):

  • Creating new NumPy arrays is not supported in nopython mode
  • Returning NumPy arrays is not supported in nopython mode
  • NumPy array slicing is not supported in nopython mode
  • lists and tuples are not supported in nopython mode
  • string, datetime, cdecimal, and struct types are not implemented yet
  • Extension types (classes) are not supported in nopython mode
  • Closures are not supported
  • Raise keyword is not supported
  • Recursion is not support in nopython mode

10.51. Version 0.11

  • Experimental support for NumPy datetime type

10.52. Version 0.10

  • Annotation tool (./bin/numba –annotate –fancy) (thanks to Jay Bourque)
  • Open sourced prange
  • Support for raise statement
  • Pluggable array representation
  • Support for enumerate and zip (thanks to Eugene Toder)
  • Better string formatting support (thanks to Eugene Toder)
  • Builtins min(), max() and bool() (thanks to Eugene Toder)
  • Fix some code reloading issues (thanks to Björn Linse)
  • Recognize NumPy scalar objects (thanks to Björn Linse)

10.53. Version 0.9

  • Improved math support
  • Open sourced generalized ufuncs
  • Improved array expressions

10.54. Version 0.8

  • Support for autojit classes
    • Inheritance not yet supported
  • Python 3 support for pycc
  • Allow retrieval of ctypes function wrapper
    • And hence support retrieval of a pointer to the function
  • Fixed a memory leak of array slicing views

10.55. Version 0.7.2

10.56. Version 0.7.1

  • Various bug fixes

10.57. Version 0.7

  • Open sourced single-threaded ufunc vectorizer
  • Open sourced NumPy array expression compilation
  • Open sourced fast NumPy array slicing
  • Experimental Python 3 support
  • Support for typed containers
    • typed lists and tuples
  • Support for iteration over objects
  • Support object comparisons
  • Preliminary CFFI support
    • Jit calls to CFFI functions (passed into autojit functions)
    • TODO: Recognize ffi_lib.my_func attributes
  • Improved support for ctypes
  • Allow declaring extension attribute types as through class attributes
  • Support for type casting in Python
    • Get the same semantics with or without numba compilation
  • Support for recursion
    • For jit methods and extension classes
  • Allow jit functions as C callbacks
  • Friendlier error reporting
  • Internal improvements
  • A variety of bug fixes

10.58. Version 0.6.1

  • Support for bitwise operations

10.59. Version 0.6

  • Python 2.6 support
  • Programmable typing
    • Allow users to add type inference for external code
  • Better NumPy type inference
    • outer, inner, dot, vdot, tensordot, nonzero, where, binary ufuncs + methods (reduce, accumulate, reduceat, outer)
  • Type based alias analysis
    • Support for strict aliasing
  • Much faster autojit dispatch when calling from Python
  • Faster numerical loops through data and stride pre-loading
  • Integral overflow and underflow checking for conversions from objects
  • Make Meta dependency optional

10.60. Version 0.5

  • SSA-based type inference
    • Allows variable reuse
    • Allow referring to variables before lexical definition
  • Support multiple comparisons
  • Support for template types
  • List comprehensions
  • Support for pointers
  • Many bug fixes
  • Added user documentation

10.61. Version 0.4

10.62. Version 0.3.2

  • Add support for object arithmetic (issue 56).
  • Bug fixes (issue 55).

10.63. Version 0.3

  • Changed default compilation approach to ast
  • Added support for cross-module linking
  • Added support for closures (can jit inner functions and return them) (see examples/closure.py)
  • Added support for dtype structures (can access elements of structure with attribute access) (see examples/structures.py)
  • Added support for extension types (numba classes) (see examples/numbaclasses.py)
  • Added support for general Python code (use nopython to raise an error if Python C-API is used to avoid unexpected slowness because of lack of implementation defaulting to generic Python)
  • Fixed many bugs
  • Added support to detect math operations.
  • Added with python and with nopython contexts
  • Added more examples

Many features need to be documented still. Look at examples and tests for more information.

10.64. Version 0.2

  • Added an ast approach to compilation
  • Removed d, f, i, b from numba namespace (use f8, f4, i4, b1)
  • Changed function to autojit2
  • Added autojit function to decorate calls to the function and use types of the variable to create compiled versions.
  • changed keyword arguments to jit and autojit functions to restype and argtypes to be consistent with ctypes module.
  • Added pycc – a python to shared library compiler