Changelog¶
1.19.1 - September 25th, 2017¶
1.19.0 - September 24nd, 2017¶
- Avoid storing messages in message log (GH#1361)
- fileConfig does not disable existing loggers (GH#1380)
- Offload upload_file disk I/O to separate thread (GH#1383)
- Add missing SSLContext (GH#1385)
- Collect worker thread information from sys._curent_frames (GH#1387)
- Add nanny timeout (GH#1395)
- Restart worker if memory use goes above 95% (GH#1397)
- Track workers memory use with psutil (GH#1398)
- Track scheduler delay times in workers (GH#1400)
- Add time slider to profile plot (GH#1403)
- Change memory-limit keyword to refer to maximum number of bytes (GH#1405)
- Add
cancel(force=)
keyword (GH#1408)
1.18.2 - September 2nd, 2017¶
1.18.1 - August 25th, 2017¶
- Clean up forgotten keys in fire-and-forget workloads (GH#1250)
- Handle missing extensions (GH#1263)
- Allow recreate_exception on persisted collections (GH#1253)
- Add asynchronous= keyword to blocking client methods (GH#1272)
- Restrict to horizontal panning in bokeh plots (GH#1274)
- Rename client.shutdown to client.close (GH#1275)
- Avoid blocking on event loop (GH#1270)
- Avoid cloudpickle errors for Client.get_versions (GH#1279)
- Yield on Tornado IOStream.write futures (GH#1289)
- Assume async behavior if inside a sync statement (GH#1284)
- Avoid error messages on closing (GH#1297), (GH#1296) (GH#1318)
(GH#1319)
- Add timeout= keyword to get_client (GH#1290)
- Respect timeouts when restarting (GH#1304)
- Clean file descriptor and memory leaks in tests (GH#1317)
- Deprecate Executor (GH#1302)
- Add timeout to ThreadPoolExecutor.shutdown (GH#1330)
- Clean up AsyncProcess handling (GH#1324)
- Allow unicode keys in Python 2 scheduler (GH#1328)
- Avoid leaking stolen data (GH#1326)
- Improve error handling on failed nanny starts (GH#1337), (GH#1331)
- Make Adaptive more flexible
- Support
--contact-address
and--listen-address
in worker (GH#1278) - Remove old dworker, dscheduler executables (GH#1355)
- Exit workers if nanny process fails (GH#1345)
- Auto pep8 and flake (GH#1353)
1.18.0 - July 8th, 2017¶
- Multi-threading safety (GH#1191), (GH#1228), (GH#1229)
- Improve handling of byte counting (GH#1198) (GH#1224)
- Add get_client, secede functions, refactor worker-client relationship
(GH#1201)
- Allow logging configuraiton using logging.dictConfig() (GH#1206) (GH#1211)
- Offload serialization and deserialization to separate thread (GH#1218)
- Support fire-and-forget tasks (GH#1221)
- Support bytestrings as keys (for Julia) (GH#1234)
- Resolve testing corner-cases (GH#1236), (GH#1237), (GH#1240), (GH#1241), (GH#1242), (GH#1244)
- Automatic use of scatter/gather(direct=True) in more cases (GH#1239)
1.17.1 - June 14th, 2017¶
1.17.0 - June 9th, 2017¶
- Reevaluate worker occupancy periodically during scheduler downtime (GH#1038) (GH#1101)
- Add
AioClient
asyncio-compatible client API (GH#1029) (GH#1092) (GH#1099) - Update Keras serializer (GH#1067)
- Support TLS/SSL connections for security (GH#866) (GH#1034)
- Always create new worker directory when passed
--local-directory
(GH#1079) - Support pre-scattering data when using joblib frontent (GH#1022)
- Make workers more robust to failure of
sizeof
function (GH#1108) and writing to disk (GH#1096) - Add
is_empty
andupdate
methods toas_completed
(GH#1113) - Remove
_get
coroutine and replace withget(..., sync=False)
(GH#1109) - Improve API compatibility with async/await syntax (GH#1115) (GH#1124)
- Add distributed Queues (GH#1117) and shared Variables (GH#1128) to enable inter-client coordination
- Support direct client-to-worker scattering and gathering (GH#1130) as well as performance enhancements when scattering data
- Style improvements for bokeh web dashboards (GH#1126) (GH#1141) as well as a removal of the external bokeh process
- HTML reprs for Future and Client objects (GH#1136)
- Support nested collections in client.compute (GH#1144)
- Use normal client API in asynchronous mode (GH#1152)
- Remove old distributed.collections submodule (GH#1153)
1.16.3 - May 5th, 2017¶
1.16.2 - May 3rd, 2017¶
- Support
async with Client
syntax (GH#1053) - Use internal bokeh server for default diagnostics server (GH#1047)
- Improve styling of bokeh plots when empty (GH#1046) (GH#1037)
- Support efficient serialization for sparse arrays (GH#1040)
- Prioritize newly arrived work in worker (GH#1035)
- Prescatter data with joblib backend (GH#1022)
- Make client.restart more robust to worker failure (GH#1018)
- Support preloading a module or script in dask-worker or dask-scheduler processes (GH#1016)
- Specify network interface in command line interface (GH#1007)
- Client.scatter supports a single element (GH#1003)
- Use blosc compression on all memoryviews passing through comms (GH#998)
- Add concurrent.futures-compatible Executor (GH#997)
- Add as_completed.batches method and return results (GH#994) (GH#971)
- Allow worker_clients to optionally stay within the thread pool (GH#993)
- Add bytes-stored and tasks-processing diagnostic histograms (GH#990)
- Run supports non-msgpack-serializable results (GH#965)
1.16.1 - March 22nd, 2017¶
- Use inproc transport in LocalCluster (GH#919)
- Add structured and queryable cluster event logs (GH#922)
- Use connection pool for inter-worker communication (GH#935)
- Robustly shut down spawned worker processes at shutdown (GH#928)
- Worker death timeout (GH#940)
- More visual reporting of exceptions in progressbar (GH#941)
- Render disk and serialization events to task stream visual (GH#943)
- Support async for / await protocol (GH#952)
- Ensure random generators are re-seeded in worker processes (GH#953)
- Upload sourcecode as zip module (GH#886)
- Replay remote exceptions in local process (GH#894)
1.16.0 - February 24th, 2017¶
- First come first served priorities on client submissions (GH#840)
- Can specify Bokeh internal ports (GH#850)
- Allow stolen tasks to return from either worker (GH#853), (GH#875)
- Add worker resource constraints during execution (GH#857)
- Send small data through Channels (GH#858)
- Better estimates for SciPy sparse matrix memory costs (GH#863)
- Avoid stealing long running tasks (GH#873)
- Maintain fortran ordering of NumPy arrays (GH#876)
- Add
--scheduler-file
keyword to dask-scheduler (GH#877) - Add serializer for Keras models (GH#878)
- Support uploading modules from zip files (GH#886)
- Improve titles of Bokeh dashboards (GH#895)
1.15.2 - January 27th, 2017¶
- Fix a bug where arrays with large dtypes or shapes were being improperly compressed (GH#830 GH#832 GH#833)
- Extend
as_completed
to accept new futures during iteration (GH#829) - Add
--nohost
keyword todask-ssh
startup utility (GH#827) - Support scheduler shutdown of remote workers, useful for adaptive clusters (:pr: 811 GH#816 GH#821)
- Add
Client.run_on_scheduler
method for running debug functions on the scheduler (GH#808)
1.15.1 - January 11th, 2017¶
- Make compatibile with Bokeh 0.12.4 (GH#803)
- Avoid compressing arrays if not helpful (GH#777)
- Optimize inter-worker data transfer (GH#770) (GH#790)
- Add –local-directory keyword to worker (GH#788)
- Enable workers to arrive to the cluster with their own data. Useful if a worker leaves and comes back (GH#785)
- Resolve thread safety bug when using local_client (GH#802)
- Resolve scheduling issues in worker (GH#804)
1.15.0 - January 2nd, 2017¶
- Major Worker refactor (GH#704)
- Major Scheduler refactor (GH#717) (GH#722) (GH#724) (GH#742) (GH#743
- Add
check
(default isFalse
) option toClient.get_versions
to raise if the versions don’t match on client, scheduler & workers (GH#664) Future.add_done_callback
executes in separate thread (GH#656)- Clean up numpy serialization (GH#670)
- Support serialization of Tornado v4.5 coroutines (GH#673)
- Use CPickle instead of Pickle in Python 2 (GH#684)
- Use Forkserver rather than Fork on Unix in Python 3 (GH#687)
- Support abstract resources for per-task constraints (GH#694) (GH#720) (GH#737)
- Add TCP timeouts (GH#697)
- Add embedded Bokeh server to workers (GH#709) (GH#713) (GH#738)
- Add embedded Bokeh server to scheduler (GH#724) (GH#736) (GH#738)
- Add more precise timers for Windows (GH#713)
- Add Versioneer (GH#715)
- Support inter-client channels (GH#729) (GH#749)
- Scheduler Performance improvements (GH#740) (GH#760)
- Improve load balancing and work stealing (GH#747) (GH#754) (GH#757)
- Run Tornado coroutines on workers
- Avoid slow sizeof call on Pandas dataframes (GH#758)
1.14.3 - November 13th, 2016¶
1.14.2 - November 11th, 2016¶
1.14.0 - November 3rd, 2016¶
- Add
Client.get_versions()
function to return software and package information from the scheduler, workers, and client (GH#595) - Improved windows support (GH#577) (GH#590) (GH#583) (GH#597)
- Clean up rpc objects explicitly (GH#584)
- Normalize collections against known futures (GH#587)
- Add key= keyword to map to specify keynames (GH#589)
- Custom data serialization (GH#606)
- Refactor the web interface (GH#608) (GH#615) (GH#621)
- Allow user-supplied Executor in Worker (GH#609)
- Pass Worker kwargs through LocalCluster
1.13.3 - October 15th, 2016¶
- Schedulers can retire workers cleanly
- Add
Future.add_done_callback
forconcurrent.futures
compatibility - Update web interface to be consistent with Bokeh 0.12.3
- Close streams explicitly, avoiding race conditions and supporting more robust restarts on Windows.
- Improved shuffled performance for dask.dataframe
- Add adaptive allocation cluster manager
- Reduce administrative overhead when dealing with many workers
dask-ssh --log-directory .
no longer errors- Microperformance tuning for the scheduler
1.13.2¶
- Revert dask_worker to use fork rather than subprocess by default
- Scatter retains type information
- Bokeh always uses subprocess rather than spawn
1.13.1¶
- Fix critical Windows error with dask_worker executable
1.13.0¶
- Rename Executor to Client (GH#492)
- Add
--memory-limit
option todask-worker
, enabling spill-to-disk behavior when running out of memory (GH#485) - Add
--pid-file
option to dask-worker and--dask-scheduler
(GH#496) - Add
upload_environment
function to distribute conda environments. This is experimental, undocumented, and may change without notice. (GH#494) - Add
workers=
keyword argument toClient.compute
andClient.persist
, supporting location-restricted workloads with Dask collections (GH#484) - Add
upload_environment
function to distribute conda environments. This is experimental, undocumented, and may change without notice. (GH#494)- Add optional
dask_worker=
keyword toclient.run
functions that gets provided the worker or nanny object - Add
nanny=False
keyword toClient.run
, allowing for the execution of arbitrary functions on the nannies as well as normal workers
- Add optional
1.12.2¶
This release adds some new features and removes dead code
- Publish and share datasets on the scheduler between many clients (GH#453). See Publish Datasets.
- Launch tasks from other tasks (experimental) (GH#471). See Launch Tasks from Tasks.
- Remove unused code, notably the
Center
object and older client functions (GH#478) Executor()
andLocalCluster()
is now robust to Bokeh’s absence (GH#481)- Removed s3fs and boto3 from requirements. These have moved to Dask.
1.12.1¶
This release is largely a bugfix release, recovering from the previous large refactor.
- Fixes from previous refactor
- Ensure idempotence across clients
- Stress test losing scattered data permanently
- IPython fixes
- Add
start_ipython_scheduler
method to Executor - Add
%remote
magic for workers - Clean up code and tests
- Add
Pool connects to maintain reuse and reduce number of open file handles
Re-implement work stealing algorithm
Support cancellation of tuple keys, such as occur in dask.arrays
Start synchronizing against worker data that may be superfluous
- Improve bokeh plots styling
- Add memory plot tracking number of bytes
- Make the progress bars more compact and align colors
- Add workers/ page with workers table, stacks/processing plot, and memory
Add this release notes document
1.12.0¶
This release was largely a refactoring release. Internals were changed significantly without many new features.
- Major refactor of the scheduler to use transitions system
- Tweak protocol to traverse down complex messages in search of large bytestrings
- Add dask-submit and dask-remote
- Refactor HDFS writing to align with changes in the dask library
- Executor reconnects to scheduler on broken connection or failed scheduler
- Support sklearn.external.joblib as well as normal joblib