Apache Livy and Workbench#

To support your organization’s data analysis operations, Data Science & AI Workbench enables platform users to connect to remote Apache Hadoop or Spark clusters. Workbench uses Apache Livy to handle session management and communication to Spark clusters, including different versions of Spark, independent clusters, and even different types of Hadoop distributions, such as those installed by different Cloudera Data Platform (CDP) Parcel versions.

Livy provides all the authentication layers that Hadoop administrators are familiar with, including Kerberos. Workbench can also authenticate to a Hadoop Distributed File System (HDFS) using Kerberos when Kerberos Impersonation is enabled.

Selecting a Spark template when creating a project will connect users to the remote Spark cluster where Livy is installed. They can use the Python libraries available through the platform or package a specific environment for the job. For more information, see Hadoop / Spark.

Tested Versions:

Workbench has been verified against the following versions.

Software

Version

Hadoop (Includes YARN and HDFS)

3.1.1

Spark

2.4.7

Hive

3.1.3000

Impala

3.4.0

Livy

0.7.1-incubating

Note

Workbench has also been verified against Cloudera Data Platform 7.1.7.