Frequently asked questions#

General#

When was the general availability (GA) release of Workbench v5?

Our GA release was August 31, 2017 (version 5.0.3). Our most recent version was released February 28, 2023 (version 5.6.1).

Which integrated development environments (IDEs) can I use with Workbench?

Workbench supports the use of Jupyter Notebooks and JupyterLab, which are the most popular integrated data science environments for working with Python and R notebooks. You can also install and utilize both RStudio and VSCode for use in building your Workbench projects.

Can I deploy multiple data science applications to Workbench?

Yes, you can deploy multiple data science applications and languages across a Workbench cluster. Each data science application runs in a secure and isolated environment with all of the dependencies from Anaconda that it requires.

A single node can run multiple applications based on the amount of compute resources (CPU and RAM) available on a given node. Workbench handles all of the resource allocation and application scheduling for you.

Does Workbench support high availability deployments?

Partially. Some of the Workbench services and user-deployed apps will be automatically configured when installed to three or more nodes. Workbench provides several automatic mechanisms for fault tolerance and service continuity, including automatic restarts, health checks, and service migration.

For more information, see Fault tolerance in Workbench.

Which identity management and authentication protocols does Workbench support?

Workbench comes with out-of-the-box support for the following:

  • LDAP / AD

  • SAML

  • Kerberos

For more information, see Connecting to external identity providers.

Does Workbench support two-factor authentication (including one-time passwords)?

Yes, Workbench supports single sign-on (SSO) and two-factor authentication (2FA) using FreeOTP, Google Authenticator or Google Authenticator compatible 2FA.

You can configure one-time password policies in Workbench by navigating to the authentication center and clicking on Authentication and then OTP Policy.

System requirements#

What operating systems are supported for Workbench?

Please see operating system requirements.

Note

Linux distributions other than those listed in the documentation can be supported on request.

What are the minimum system requirements for Workbench nodes?

Please see system requirements.

Which browsers are supported for Workbench?

Please see browser requirements.

Does Workbench come with a version control system?

Yes, Workbench includes an internal Git server, which allows users to save and commit versions of their projects.

Can Workbench integrate with my own Git server?

Yes, as described in Connecting to an external version control repository.

Installation#

How do I install Workbench?

The Workbench installer is a single tarball that includes Docker, Kubernetes, system dependencies, and all of the components and images necessary to run Workbench. The system administrator runs one command on each node.

Can Workbench be installed on-premises?

Yes, including air-gapped environments.

Can Workbench be installed on cloud environments?

Yes, including Amazon AWS, Microsoft Azure, and Google Cloud Platform.

Does Workbench support air gapped (off-line) environments?

Yes, the Workbench installer includes Docker, Kubernetes, system dependencies, and all of the components and images necessary to run Workbench on-premises or on a private cloud, with or without internet connectivity. We can deliver the installer to you on a USB drive.

Can I build Docker images for the install of Workbench?

No. The installation of Workbench is supported only by using the single-file installer. The Workbench installer includes Docker, Kubernetes, system dependencies, and all of the components and images necessary for Workbench.

Can I install Workbench on my own instance of Kubernetes?

Yes, please refer to our BYOK8s environment preparation guide.

Can I get the Workbench installer packaged as a virtual machine (VM), Amazon Machine Image (AMI) or other installation package?

No. The installation of Workbench is supported only by using the single-file installer.

Which ports are externally accessible from Workbench?

Please see network requirements.

Can I use Workbench to connect to my Hadoop/Spark cluster?

Yes. Workbench supports connectivity from notebooks to local or remote Spark clusters by using the Sparkmagic client and a Livy REST API server. Workbench provides Sparkmagic, which includes Spark, PySpark, and SparkR notebook kernels for deployment.

How can I manage Anaconda packages on my Hadoop/Spark cluster?

An administrator can generate custom Anaconda parcels for Cloudera CDH or custom Anaconda management packs for Hortonworks HDP using Workbench. A data scientist can use these Anaconda libraries from a notebook as part of a Spark job.

On how many nodes can I install Workbench?

You can install Workbench in the following configurations during the initial installation:

  • One node (one master node)

  • Two nodes (one master node, one worker node)

  • Three nodes (one master node, two worker nodes)

  • Four nodes (one master node, three worker nodes)

After the initial installation, you can add or remove worker nodes from the Workbench cluster at any time.

One node serves as the master node and writes storage to disk, and the other nodes serve as worker nodes. Workbench services and user-deployed applications run seamlessly on the master and worker nodes.

Can I generate certificates manually?

Yes, if automatic TLS/SSL certificate generation fails for any reason, you can generate the certificates manually. Follow these steps:

  1. Generate self-signed temporary certificates. On the master node, run:

    cd path/to/Anaconda/Enterprise/unpacked/installer
    cd DIY-SSL-CA
    bash create_noprompt.sh DESIRED_FQDN
    cp out/DESIRED_FQDN/secret.yaml /var/lib/gravity/planet/share/secrets.yaml
    

    Replace DESIRED_FQDN with the fully-qualified domain of the cluster to which you are installing Workbench.

    Saving this file as /var/lib/gravity/planet/share/secrets.yaml on the Workbench master node makes it accessible as /ext/share/secrets.yaml within the Workbench environment which can be accessed with the command sudo gravity enter.

  2. Update the certs secret

    Replace the built-in certs secret with the contents of secrets.yaml. Enter the Workbench environment and run these commands:

    $ kubectl delete secrets certs
    secret "certs" deleted
    $ kubectl create -f /ext/share/secrets.yaml
    secret "certs" created
    

GPU Support#

How can I make GPUs available to my team of data scientists?

If your data science team plans to use version 5.2 of the Workbench AI enablement platform, here are a few approaches to consider when planning your GPU cluster:

  • Build a dedicated GPU-only cluster.

    If GPUs will be used by specific teams only, creating a separate cluster allows you to more carefully control GPU access.

  • Build a heterogeneous cluster.

    Not all projects require GPUs, so a cluster containing a mix of worker nodes—with and without GPUs—can serve a variety of use cases in a cost-effective way.

  • Add GPU nodes to an existing cluster.

    If your team’s resource requirements aren’t clearly defined, you can start with a CPU-only cluster, and add GPU nodes to create a heterogeneous cluster when the need arises.

Workbench supports heterogeneous clusters by allowing you to create different “resource profiles” for projects. Each resource profile describes the number of CPU cores, the amount of memory, and the number of GPUs the project needs. Administrators typically will create “Regular”, “Large”, and “Large + GPU” resource profiles for users to select from when running their project. If a project requires a GPU, Workbench will run it on only those cluster nodes with an available GPU.

What software is GPU accelerated?

Anaconda provides a number of GPU-accelerated packages for data science. For deep learning, these include:

  • Keras (keras-gpu)

  • TensorFlow (tensorflow-gpu)

  • Caffe (caffe-gpu)

  • PyTorch (pytorch)

  • MXNet (mxnet-gpu)

For boosted decision tree models:

  • XGBoost (py-xgboost-gpu)

For more general array programming, custom algorithm development, and simulations:

  • CuPy (cupy)

  • Numba (numba)

Note

Unless a package has been specifically optimized for GPUs (by the authors) and built by Anaconda with GPU support, it will not be GPU-accelerated, even if the hardware is present.

What hardware does each of my cluster nodes require?

Anaconda recommends installing Workbench in a cluster configuration. Each installation should have an odd number of master nodes, and Anaconda recommends at least one worker node. The master node runs all Workbench core services and does not need a GPU.

Using EC2 instances, a minimal configuration is one master node running on a m4.4xlarge instance and one GPU worker node running on a p3.2xlarge instance. More users will require more worker nodes—and possibly a mix of CPU and GPU worker nodes.

See Hardware requirements for the baseline requirements for Workbench.

How many GPUs does my cluster need?

A best practice for machine learning is for each user to have exclusive use of their GPU(s) while their project is running. This ensures they have sufficient GPU memory available for training, and provides more consistent performance.

When a Workbench user launches a notebook session or deployment that requires GPUs, those resources are reserved for as long as the project is running. When the notebook session or deployment is stopped, the GPUs are returned to the available pool for another user to claim.

The number of GPUs required in the cluster can therefore be determined by the number of concurrently running notebook sessions and deployments that are expected. Adding nodes to a Workbench cluster is straightforward, so organizations can start with a conservative number of GPUs and grow as demand increases.

To get more out of your GPU resources, Workbench supports scheduling and running unattended jobs. This enables you to execute periodic retraining tasks—or other resource-intensive tasks—after regular business hours, or at times GPUs would otherwise be idle.

What kind of GPUs should I use?

Although the Anaconda Distribution supports a wide range of NVIDIA GPUs, Workbench deployments for data science teams developing models should use one of the following GPUs:

  • Tesla V100 (recommended)

  • Tesla P100 (adequate)

Can I mix GPU models in one cluster?

Kubernetes cannot currently distinguish between different GPU models in the same cluster node, so Workbench requires all GPU-enabled nodes within a given cluster to have the same GPU model (for example, all Tesla V100). Different clusters (e.g., “production” and “development”) can use different GPU models, of course.

Can I use cloud GPUs?

Yes, Workbench 5.2 can be installed on cloud VMs with GPU support. Amazon Web Services (AWS), Google Cloud Platform, and Microsoft Azure all offer Tesla GPU options.

Anaconda Project#

What operating systems and Python versions are supported for Anaconda Project?

Anaconda Project supports Windows, macOS and Linux, and tracks the latest Anaconda releases with Python 2.7, 3.5, 3.6, and 3.7.

How is encapsulation with Anaconda Project different from creating a workspace or project in Spyder, PyCharm, or other IDEs?

A workspace or project in an IDE is a directory of files on your desktop. Anaconda Project encapsulates those files, but also includes additional parameters to describe how to run a project with its dependencies. Anaconda Project is portable and allows users to run, share, and deploy applications across different operating systems.

What types of projects can I deploy?

Anaconda Project is very flexible and can deploy many types of projects with conda or pip dependencies. Deployable projects include:

  • Notebooks (Python and R)

  • Bokeh applications and dashboards

  • REST APIs in Python and R (including machine learning scoring and predictions)

  • Python and R scripts

  • Third-party apps, web frameworks, and visualization tools such as Tensorboard, Flask, Falcon, deck.gl, plot.ly Dash, and more.

Any generic Python and R script or webapp can be configured to serve on port 8086, which will show the app in Workbench when deployed.

Does Workbench include Docker images for my data science projects?

Workbench includes data science application images for the editor and deployments. You can install additional packages in either environment using Anaconda Project. Anaconda Project includes the information required to reproduce the project environment with Anaconda, including Python, R, or any other conda package or pip dependencies.

After upgrading Workbench my projects no longer work

If you’ve upgraded to Workbench 5.4 and are getting package install errors you may need to re-write your anaconda-project.yml file.

If you were using modified template anaconda-project.yml files for Python 2.7, 3.5, or 3.6 it is best to leave the package list empty in the env_specs section. Then you should add your required packages and their versions to the global package list.

Here’s an example using the Python 3.6 template anaconda-project.yml file from Workbench version 5.3.1 where the package list has been removed from the env_specs and the required packages added to the global list.

name: Python 3.6

description: A comprehensive project template that contains all of the packages available in the Anaconda Distribution v5.0.1 for Python 3.6. Get started with the most popular and powerful packages in data science.

channels: []
packages:
  - python=3.6
  - notebook
  - pandas=0.25
  - psycopg2
  - holoviews

platforms:
  - linux-64
  - osx-64
  - win-64

env_specs:
  anaconda50_py36:
    packages: []
    channels: []

Notebooks#

Are the deployed, self-service notebooks read-only?

Yes, the deployed versions of self-service notebooks are read-only, but they can be executed by collaborators or viewers. Owners of the project that contain the notebooks can edit the notebook and deploy (or re-deploy) them.

What happens when other people run the notebook? Does it overwrite any file, if notebook is writing to a file?

A deployed, self-service notebook is read-only but can be executed by other collaborators or viewers. If multiple users are running a notebook that writes to a file, the file will be overwritten unless the notebook is configured to write data based on a username or other environment variable.

Can I define environment variables as part of my data science project?

Yes, Anaconda Project supports environment variables that can be defined when deploying a data science application. Only project collaborators can view or edit environment variables, and they cannot be accessed by viewers.

How are Anaconda Project and Workbench available?

Anaconda Project is free and open-source. Workbench is a commercial product.

Where can I find example projects for Workbench?

Sample projects are included as part of the Workbench installation, which include sample workflows and notebooks for Python and R such as financial modeling, natural language processing, machine learning models with REST APIs, interactive Bokeh applications and dashboards, image classification, and more.

The sample projects include examples with visualization tools (Bokeh, deck.gl), pandas, scipy, Shiny, Tensorflow, Tensorboard, xgboost, and many other libraries. Users can save the sample projects to their Workbench account or download the sample projects to their local machine.

Does Workbench support batch scoring with REST APIs?

Yes, Workbench can be used to deploy machine learning models with REST APIs (including Python and R) that can be queried for batch scoring workflows. The REST APIs can be made available to other users and accessed with an API token.

Does Workbench provide tools to help define and implement REST APIs?

Yes, a data scientist can basically create a model without much work for the API development. Workbench includes an API wrapper for Python frameworks that builds on top of existing web frameworks in Anaconda, making it easy to expose your existing data science models with minimal code. You can also deploy REST APIs using existing API frameworks for Python and R.

Help and training#

Do you offer support for Workbench?

Yes! You can submit support tickets to your Technical Account Manager (TAM) if you need assistance.

Do you offer training for Workbench?

Yes, we offer product training for collaborative, end-to-end data science workflows with Workbench.

Do you have a question not answered here?

Please contact us for more information.