Installing Livy server for Hadoop Spark access

To support your organization’s data analysis operations, Anaconda Enterprise enables platform users to connect to remote Apache Hadoop or Spark clusters. Anaconda Enterprise uses Apache Livy to handle session management and communication to Apache Spark clusters, including different versions of Spark, independent clusters, and even different types of Hadoop distributions.

Livy provides all the authentication layers that Hadoop administrators are used to, including Kerberos. AE also authenticates to HDFS with Kerberos. Kerberos Impersonation must be enabled.

When Livy is installed, users can connect to a remote Spark cluster when creating projects by selecting the Spark template. They can either use the Python libraries available on the platform, or package a specific environment to target for the job. For more information, see Hadoop / Spark.

Before you begin:

Verify the connection requirements. The following table outlines the supported configurations for connecting to remote Hadoop and Spark clusters with Anaconda Enterprise.

Software

Version

Hadoop and HDFS

2.6.0+

Spark and Spark API

1.6+ and 2.X

Sparkmagic

0.12.7

Livy

0.5

Hive

1.1.0+

Impala

2.11+

Note

The Hive metastore may be Postgres or MySQL. The Livy server must run on an “edge node” or client in the Hadoop/Spark cluster. Verify that the spark-submit and/or the spark repl commands work on this machine.

Installing Livy

Follow the instructions below to install Livy into an existing Spark cluster, or download and install the offical version of Livy.

Note

This example is specific to a Red Hat-based Linux distribution, with a Hadoop installation based on Cloudera CDH. To use other systems, you’ll need to look up the corresponding commands and locations.

  1. Locate the directory that contains Anaconda Livy. Typically this will be anaconda-enterprise-X.X.X-X.X/installer/anaconda-livy-0.5.0, where X.X.X-X.X corresponds to the Anaconda Enterprise version.

  2. Copy the entire directory that contains Anaconda Livy to an edge node on the Spark/Hadoop cluster.

After installing Livy server, you’ll need to configure it to work with Anaconda Enterprise. For example, you’ll need to enable impersonation, so users running Spark sessions are able to log in to each machine in the Spark cluster. For more information, see Configuring Livy server for Hadoop Spark access.