• Anaconda Platform
  • – Welcome
  • – Anaconda Distribution
  • – Anaconda Repository
  • – Anaconda Accelerate
  • – Anaconda Adam
  • – Anaconda Enterprise Notebooks
  • – Anaconda Fusion
  • – Anaconda Scale
  • – Anaconda Cloud
  • Anaconda-sponsored OSS programs
  • – Blaze
  • – Bokeh
  • – Conda
  • – dask
    • Dask
      • Familiar user interface
      • Scales from laptops to clusters
      • Complex Algorithms
      • Index
        • Install Dask
        • Use Cases
        • Examples
          • Array
          • Bag
          • DataFrame
          • Delayed
          • Distributed Concurrent.futures
          • Tutorial
        • Dask Cheat Sheet
        • Array
        • Bag
        • DataFrame
        • Delayed
        • Futures
        • Machine Learning
        • Distributed Scheduling
        • Scheduler Overview
        • Choosing between Schedulers
        • Shared Memory
        • Scheduling in Depth
        • Inspecting Dask objects
        • Diagnostics
        • Overview
        • Specification
        • Custom Graphs
        • Optimization
        • Debugging
        • Contact and Support
        • Changelog
        • Presentations On Dask
        • Development Guidelines
        • Frequently Asked Questions
        • Comparison to PySpark
        • Opportunistic Caching
        • Internal Data Ingestion
        • Remote Data Services
        • Citations
        • Funding
    • Dask Distributed
  • – llvmlite
  • – PhosphorJS
  • – Numba

Examples¶

Array¶

Array documentation

  • Creating Dask arrays from NumPy arrays
  • Creating Dask arrays from HDF5 Datasets
  • Creating random arrays
  • Build Custom Dask.Array Function
  • Blogpost: Distributed NumPy and Image Analysis on a Cluster, January 2017
  • Use Dask.array to generate task graphs
  • Alternating Least Squares for collaborative filtering

Bag¶

Bag documentation

  • Read JSON records from disk
  • Word count

DataFrame¶

DataFrame documentation

  • Dataframes from CSV files
  • Dataframes from HDF5 files
  • Blogpost: Dataframes on a cluster, January 2017
  • Distributed DataFrames on NYCTaxi data
  • Build Parallel Algorithms for Pandas
  • Simple distributed joins
  • Build Dask.dataframes from custom format, feather

Delayed¶

Delayed documentation

  • Build Custom Arrays
  • Data Processing Pipelines
  • Blogpost: Delayed on a cluster, January 2017
  • Blogpost: Dask and Celery, September 2016
  • Basic Delayed example
  • Build Parallel Algorithms for Pandas
  • Build Dask.dataframes from custom format, feather

Distributed Concurrent.futures¶

Concurrent.futures documentation

  • Custom workflows
  • Ad Hoc Distributed Random Forests
  • Web Servers and Asynchronous task scheduling

Tutorial¶

A Dask tutorial from July 2015 (fairly old) is available here: https://github.com/dask/dask-tutorial

Docs Home
Anaconda Home
More Help & Support
2017 Anaconda, Inc.
All Rights Reserved.
Privacy Policy | EULA