Version (0.21.0)¶
NumbaPro will be deprecated with most code generation features moved into the opensource Numba and the CUDA bindings moved into a new commerical package called “Accelerate”. The new package will feature more high-level API functions from the CUDA libraries as well as MKL.
The next release of NumbaPro will provide aliases to the features that are moved to Numba and Accelerate. There will not be any new feature added to NumbaPro. In the future, there maybe bug fix releases for maintaining the aliases to the moved features.
Changes:
- Depends on numba 0.21.0
- Fix auto thread-per-block tuning support for CUDA CC 3.7 devices
Blas.dotu
is deprecated. A warning is generated when it is used.Blas.dot
is an alias to it and is preferred.
Version (0.20.0)¶
This release depends on numba 0.20, which has upgraded to CUDA 7 for GPU support. CUDA 7 has deprecated the support for all 32-bit platforms. The oldest supported Windows version is Windows 7.0. This does not affect CPU features.
Version (0.19.0)¶
- Depends on numba 0.19
- Fixes issue with GPU ufunc broadcasting
- Improves GPU ufunc implementation
Version (0.18.0)¶
Depends on numba 0.18.1
- Improve CUDA gufunc implementation
- Simplified code generation
- Smarter blocksize selection
Version (0.17.1)¶
- Depends on numba 0.17.0
- Warns about incompatible numba version at import time
- Fixes some CUDA library APIs on windows
Version (0.17.0)¶
- Depends on numba 0.16.0
- Replaces llvmpy with llvmlite, which also upgrades to llvm3.5
- Update occupancy autotuner for CC 5.0 and CC 5.2 devices
- Fix handling of empty array in GPU reduction
- Fix occupancy autotuner that may pick invalid blocksize
Version (0.16.0)¶
- Add numbapro.cuda.reduce for autogeneration of CUDA reduce kernels and driver.
- Fix device to host auto transfer logic in some ufunc function.
- Upgrades to Numba 0.15
Version (0.15.0)¶
- Add numbapro.cudalib.sorting:
- Added GPU radixsort and radixselect using implementation from http://nvlabs.github.io/cub
- Added GPU segmentedsort from http://nvlabs.github.io/moderngpu
Fix GPU print() when there are multiple arguments
Version (0.14.3)¶
- CUDA driver is initialized lazily
- Improved stability of CUDA ufunc machinery
- Improved stability of parallel ufunc
Version (0.14.2)¶
- Unify numba.cuda and numbapro.cuda backend
- Enable Python 3 support
- Fixes workqueue module import for embedded python usecase
Version (0.14.1)¶
Fixes:
- UnboundReferenceError due to mishandling of incompatible driver (pre CUDA5.5 driver). The fix relaxes the driver requirement by allowing some features to fail on use.
- numbapro.cuda.* symbols are still exported when CUDA is not available. They would raise execption on use.
Version (0.14.0)¶
Features:
- Add cuSparse API
- Improve CUDA driver and resource management
- Some of CUDA-python language feature is now opensourced as numba.cuda
Fixes:
- New CUDA driver system prevents freezing OSX on kernel launch error
Version (0.13.2)¶
- Fix problem with numpy 18 array scalar contiguousness
- Fix CUDA target auto initialization on import numbapro
- Fix an access violation error on Windows 8 due to mishandling by LLVM.
- Add non-public API for profiler control.
Version (0.13.1)¶
- Guard error due to mishandling of interleaved memory buffer (#60)
- Update to use Numba 0.12.1
- Fix powi bug
Version (0.13)¶
- Add print statement for strings and scalar numeric types for debugging on GPU
- Add constant and local memory array allocation on GPU
- Add debug mode for GPU
- Allow raising exception classes on GPU
- Update CUDA toolkit libraries
- Fix boolean mapping
Version (0.12.7)¶
- Fix major bug that mistreats py2 division as inplace floor-division for real numbers.
- Fix using of array as argument of a CUDA device function.
- Delay initialization the CUDA subsystem upon first import of the cuda package.
- Add docstrings.
Version (0.12.6)¶
- Fix major bug that mistreats py2 division as floor-division for real numbers.
Version (0.12.5)¶
- Update to Numba 0.10.2
- Update to LLVM 3.3
- Various bug fixes
Version (0.12.4)¶
- Update to Numba 0.10.0
- Minor bug fixes
Version (0.12.3)¶
- Accept older driver by defering driver error to first use of specific API
- Report incompatible GPU at context creation
- Improve device information reporting
- Autotuning base on compiler info and occupancy calculator
- Add basic support for ravel and reshape
Version (0.12.2)¶
- Distribute CUDA toolkit in Anaconda
- Better error message
- Fix gufunc signature parsing to accept trailing comma.
- Fix CUDA driver log info bug
- Support JIT linking
Version (0.12.1)¶
- Fix libNVVM search path (now accept directory path)
- Fix sign-extension error in forloop precondition
- Fix support for true-division
Version (0.12.0)¶
- Use CUDA 5.5rc
- Expand math support through CUDA NVVM libdevice
- Rewritten nopython mode for CUDA-Python
- Removed experimental CU API
- Removed minivectorize
Version (0.11.0)¶
- Add cuBlas binding
- Improve CUDA ndarray and memory managment
- Add CUDA mapped host memory
- Add CUDA event
Version (0.10.1)¶
- Fix CU memory leak
- Fix CU hanging on some GPU
- Improve error message for unsupported GPU devices
- Add cuFFT
Version (0.10)¶
- Added Compute Unit (CU) API
- Added cuRAND binding
- Added CUDA device array
- Various improvements to CUDA support
Version (0.9)¶
- Improve CUDA driver discovery.
Version 0.8¶
- Update for SSA types inference in Numba
- Allow user to select CUDA device
- Add support for pinned and mapped CUDA memory
- Improvement on small memory allocation in CUDA
- Default to use libNVVM from Anaconda
- Bug fixes
Version 0.7¶
- Prange: parallel for-range
- Array slicing
- Refactor CUDA dispatch mechanisms
- Migrate to NVVM instead of PTX for CUDA codegen
Version 0.6 and earlier¶
- Array expressions
- Fast ufuncs and generalized-ufunc (gufunc) with single-core, multi-core and CUDA
- CUDA JIT.