• Anaconda Platform
  • – Welcome
  • – Anaconda Distribution
  • – Anaconda Repository
  • – Anaconda Accelerate
  • – Anaconda Adam
  • – Anaconda Enterprise Notebooks
  • – Anaconda Fusion
  • – Anaconda Scale
    • Requirements
    • Installation
    • Getting Started
    • User Guide
    • Using Anaconda with Spark
      • Configuring Anaconda with Spark
      • How to run PySpark as a Spark Standalone job
      • How to run PySpark with the YARN resource manager
      • How to perform a word count on text data in HDFS
      • How to perform distributed natural language processing
    • Using Anaconda with Cloudera CDH
    • Help and Support
  • – Anaconda Cloud
  • Open source incubated projects
  • – Blaze
  • – Bokeh
  • – conda
  • – dask
  • – llvmlite
  • – PhosphorJS
  • – Numba

Using Anaconda with SparkΒΆ

These how-tos will show you how to use Anaconda with Apache Spark and PySpark. These how-tos will also show you how to interact with data stored within the Hadoop Distributed File System (HDFS) on the cluster.

While these how-tos are not dependent on each other and can be accomplished in any order, it is recommended that you begin with the Configuring Anaconda with Spark first.

  • Configuring Anaconda with Spark
  • How to run PySpark as a Spark Standalone job
  • How to run PySpark with the YARN resource manager
  • How to perform a word count on text data in HDFS
  • How to perform distributed natural language processing
Docs Home
Continuum Analytics Home
More Help & Support
2016 Continuum Analytics, Inc.
All Rights Reserved.
Privacy Policy | EULA