Menu

Continuum Analytics: Documentation

DOCUMENTATION
Search
Menu
  • Anaconda Platform
  • – Welcome
  • – Anaconda Distribution
  • – Anaconda Repository
  • – Anaconda Accelerate
  • – Anaconda Adam
  • – Anaconda Enterprise Notebooks
  • – Anaconda Mosaic
  • – Anaconda Fusion
  • – Anaconda Scale
  • – Anaconda Cloud
  • Open source incubated projects
  • – Blaze
  • – Bokeh
  • – conda
  • – dask
  • – llvmlite
  • – PhosphorJS
  • – Numba

Python with Spark How-tosΒΆ

These how-tos will show you how to run Python tasks on a Spark cluster using the PySpark module. These how-tos will also show you how to interact with data stored within HDFS on the cluster.

While these how-tos are not dependent on each other and can be accomplished in any order it is recommended that you begin with the Overview of Spark, YARN and HDFS first.

  • Overview of Spark, YARN and HDFS
  • How to Run a Spark Standalone Job
  • How to Run with the YARN resource manager
  • How to perform a word count on text data in HDFS
  • How to do Natural Language Processing
  • How to do Image Processing with GPUs
  • Docs Home
  • Continuum Analytics Home
  • More Help & Support
Privacy Policy | EULA © 2016 Continuum Analytics, Inc. All Rights Reserved.