Using Anaconda with Cloudera CDH¶
There are different methods of using Anaconda Scale on a cluster with Cloudera CDH:
- The Anaconda parcel for Cloudera CDH
- A dynamic, managed version of Anaconda on all of the nodes
SEE ALSO: Blog post Self-service Open Data Science: Custom Anaconda parcels for Cloudera.
The freely available Anaconda parcel is based on Python 2.7 and includes the default conda packages that are available in the free Anaconda distribution.
In addition to the freely available Anaconda parcel based on Anaconda with Python 2.7, Anaconda Workgroup and Anaconda Enterprise subscribers can also use Anaconda Repository to create and distribute their own custom Anaconda parcels for Cloudera Manager.
If you need more dynamic functionality than the Anaconda parcels offer, Anaconda Scale also provides functionality to dynamically install and manage multiple conda environments (such as Python 2, Python 3, and R environments) and packages across a cluster.
Using the Anaconda Parcel¶
Refer to the Anaconda parcel documentation for more information about installing the Anaconda parcel on a CDH cluster using Cloudera Manager.
If you want to transition from the Anaconda parcel for CDH to the dynamic, managed version of Anaconda Scale, the instructions below describe how to uninstall the Anaconda parcel on a CDH cluster and transition to a centrally managed version of Anaconda.
Uninstalling the Anaconda parcel¶
If the Anaconda parcel is installed on the CDH cluster, use the following steps to uninstall the parcel.
- From the Cloudera Manager Admin Console, click the Parcels indicator in the top navigation bar.
- Click the
Deactivate
button to the right of the Anaconda parcel listing. - Click
OK
on the Deactivate prompt to deactivate the Anaconda parcel and restart Spark and related services. - Click the arrow to the right of the Anaconda parcel listing and choose
Remove From Hosts
, which will prompt with a confirmation dialog. - The Anaconda parcel has been removed from the cluster nodes.
For more information about managing Cloudera parcels, refer to the Cloudera documentation.
Transitioning to a centrally managed Anaconda installation¶
Once you’ve uninstalled the Anaconda parcel, refer to the Anaconda Scale installation instructions for more information about installing a centrally managed version of Anaconda.
Using Anaconda with Cloudera CDH and Spark¶
You can submit Spark jobs using the PYSPARK_PYTHON
environment variable that
refers to the location of Anaconda, for example:
$ PYSPARK_PYTHON=/opt/continuum/anaconda/bin/python spark-submit pyspark_script.py