• Anaconda Platform
  • – Welcome
  • – Anaconda Distribution
  • – Anaconda Repository
  • – Anaconda Accelerate
  • – Anaconda Adam
  • – Anaconda Enterprise Notebooks
  • – Anaconda Fusion
  • – Anaconda Scale
  • – Anaconda Cloud
  • Anaconda-sponsored OSS programs
  • – Blaze
  • – Bokeh
  • – Conda
  • – dask
    • Dask
      • Familiar user interface
      • Scales from laptops to clusters
      • Complex Algorithms
      • Index
        • Install Dask
        • Use Cases
        • Examples
          • Array
          • Bag
          • DataFrame
          • Delayed
          • Distributed Concurrent.futures
          • Tutorial
        • Dask Cheat Sheet
        • Array
        • Bag
        • DataFrame
        • Delayed
        • Futures
        • Machine Learning
        • Distributed Scheduling
        • Scheduler Overview
        • Choosing between Schedulers
        • Shared Memory
        • Scheduling in Depth
        • Inspecting Dask objects
        • Diagnostics
        • Overview
        • Specification
        • Custom Graphs
        • Optimization
        • Debugging
        • Contact and Support
        • Changelog
        • Presentations On Dask
        • Development Guidelines
        • Frequently Asked Questions
        • Comparison to PySpark
        • Opportunistic Caching
        • Internal Data Ingestion
        • Remote Data Services
        • Citations
        • Funding
    • Dask Distributed
  • – llvmlite
  • – PhosphorJS
  • – Numba

Read JSON records from diskΒΆ

We commonly use dask.bag to process unstructured or semi-structured data:

>>> import dask.bag as db
>>> import json
>>> js = db.read_text('logs/2015-*.json.gz').map(json.loads)
>>> js.take(2)
({'name': 'Alice', 'location': {'city': 'LA', 'state': 'CA'}},
 {'name': 'Bob', 'location': {'city': 'NYC', 'state': 'NY'})

>>> result = js.pluck('name').frequencies()  # just another Bag
>>> dict(result)                             # Evaluate Result
{'Alice': 10000, 'Bob': 5555, 'Charlie': ...}
Docs Home
Anaconda Home
More Help & Support
2017 Anaconda, Inc.
All Rights Reserved.
Privacy Policy | EULA