Menu

Continuum Analytics: Documentation

DOCUMENTATION
Search
Menu
  • Anaconda Platform
  • – Welcome
  • – Anaconda Distribution
  • – Anaconda Repository
  • – Anaconda Accelerate
  • – Anaconda Adam
  • – Anaconda Enterprise Notebooks
  • – Anaconda Mosaic
  • – Anaconda Fusion
  • – Anaconda Scale
    • Distributed Computing
      • Requirements
      • Installation
      • Getting Started
      • User Guide
      • Using Anaconda with Spark
        • Overview of Spark, YARN and HDFS
        • How to run PySpark as a Spark Standalone job
        • How to run PySpark with the YARN resource manager
        • How to perform a word count on text data in HDFS
        • How to perform distributed natural language processing
      • Using Anaconda with Cloudera CDH
      • Help and Support
  • – Anaconda Cloud
  • Open source incubated projects
  • – Blaze
  • – Bokeh
  • – conda
  • – dask
  • – llvmlite
  • – PhosphorJS
  • – Numba

Using Anaconda with SparkΒΆ

These how-tos will show you how to use Anaconda with Apache Spark and PySpark. These how-tos will also show you how to interact with data stored within the Hadoop Distributed File System (HDFS) on the cluster.

While these how-tos are not dependent on each other and can be accomplished in any order, it is recommended that you begin with the Overview of Spark, YARN and HDFS first.

  • Overview of Spark, YARN and HDFS
  • How to run PySpark as a Spark Standalone job
  • How to run PySpark with the YARN resource manager
  • How to perform a word count on text data in HDFS
  • How to perform distributed natural language processing
  • Docs Home
  • Continuum Analytics Home
  • More Help & Support
Privacy Policy | EULA © 2016 Continuum Analytics, Inc. All Rights Reserved.