Installation requirements

In this section, we describe the requirements for installing Anaconda Enterprise into your Kubernetes environment. Please review these requirements carefully.

As a data science development and deployment platform, its resource requirements may prove surprising to experienced administrators accustomed to standard microservice workloads. For a deeper understanding of these unique requirements, please see Understanding Anaconda Enterprise system requirements. Many of the sections below refer to corresponding sections in that document.

We have created a BYOK8s pre-installation checklist to help you prepare for installation. This checklist will help you verify that your cluster is ready to receive the application, and that the necessary additional resources are provisioned. The Anaconda implementation team will review this checklist with you prior to beginning the installation process.

Administration server

Installation requires a machine with direct access to the target Kubernetes cluster and the Docker registry. The following software must be installed on this machine:

  • Helm version 3.2 or later.

  • The Kubernetes CLI tool kubectl.

  • For OpenShift clusters, the OpenShift CLI tool oc.

  • Optional: additional tools such as watch, and jq are useful for verification and troubleshooting.

This server will also need a copy of the Anaconda Enterprise Helm chart, which will be provided to you prior to installation.

Anaconda recommends that you identify a server that will remain available for ongoing management of the application as well. It is also useful for this server to be able to mount the storage volume(s).

One easy way to obtain all of these tools for Linux is to install ae5-conda, a single Conda environment containing helm, kubectl, oc, jq, and a number of other useful Anaconda Enterprise management utilities. To obtain this:

  1. Download the package: here.

  2. If necessary, move the package to the administration server.

  3. Execute: bash ae5-conda-latest-Linux-x86_64.sh and follow the prompts.

  4. You may need to restart your shell to add the environment to your PATH.

Supported Kubernetes versions

Anaconda Enterprise has been verified to run on Kubernetes API versions 1.15 through 1.24.

Note that for a given Kubernetes implementation, the vendor versioning sometimes differs from the underlying Kubernetes version. For instance, for the RedHat Openshift Container Platform (OCP), our version range covers OCP 4.3 (1.16) through OCP 4.10 (1.23).

As part of their development, testing, and customer support, the Anaconda Enterprise team has successfully installed Anaconda Enterprise on the following variants:

The links in the above table point to some vendor-specific recommendations that you can use to refine your provisioning plans.

Our testing has consistently supported the position that compatibility is reliably determined by the underlying Kubernetes API version and the ability to adhere to the other requirements listed in this document.

Namespace, service account, RBAC

Anaconda Enterprise should be installed in a namespace not occupied by any other applications, including other instances of Anaconda Enterprise. A service account should be created and given sufficient permissions both to complete the Helm-based installation and to enable the dynamic resource provisioning Anaconda Enterprise performs during normal operation.

The permissions Anaconda Enterprise requires are more permissive than would be offered an application that requires only “read-only” access to the Kubernetes API. That said, with the exception of the ingress controller, all necessary permission grants are limited to the application namespace.

See this page for a Role and ClusterRole specifications that are sufficiently permissive. We encourage you to speak with the Anaconda team about any questions you may have about these permissions.

Security

Anaconda Enterprise containers can be run using any fixed, non-zero UI, making the application compatible with an OCP Restricted Security Context constraint, or an equivalent non-permissive Kubernetes security context.

On the other hand, in order to enable the Authenticated NFS capability—which allows user containers to access certain external, authenticated fileshares—user pods must be permitted to run as root (UID 0). In this configuration, the container runs in a privileged state long enough to determine and assign the authenticated group memberships for the running user. It then drops down to a non-privileged state for all further execution. This exception will rightly be viewed with concern by some Kubernetes administrators, so feel free to speak with the Anaconda team for more background, and to see if it is possible for your application to avoid this requirement.

CPU, memory, and nodes

  • Minimum node size: 8 cores, 32GB RAM

  • Recommended: 16 cores, 64GB RAM, or larger

  • Recommended oversubscription (limits/requests ratio): 4:1

  • Minimum number of worker nodes: 3

For more information about these choices, see Hardware considerations.

Anaconda Enterprise permits the use of node labeling, taints, and tolerations in order to limit the nodes on which its containers may run. This includes the ability to specify different node sets for AE system workloads and user workloads. Any necessary affinity or toleration settings must be identified prior to installation.

Resource Profiles

Users will have the option to choose Resource Profiles created by the Cluster Administrator for their workloads. Each Resource Profile can be customized with the amount of CPU, Memory, and optionally GPU resources to be made available for user workloads. Anaconda recommends determining what Resource Profiles should be created before time of install. Please see this document for further details

Storage

A standard installation of Anaconda Enterprise requires one or two Persistent Volume Claims to be statically provisioned and bound prior to installation.

  • anaconda-storage: this volume holds our internal Postgres control database, our internal Git storage mechanism, and the internal conda package repository. If you are hosting conda packages outside of AE5, then a minimum size of 100GiB is required. However, if you intend to mirror conda packages into the AE5 repository, this will need to be sized much larger to accommodate those packages; e.g., 500GiB.

  • anaconda-persistence: this volume hosts our managed persistence storage, including custom sample projects, custom conda environments, and user code and data. Because the demands on this volume will steadily grow with usage, Anaconda recommends 1TiB of space to start.

You are free to use different names here, as long as the volumes meet the required specifications. You may also combine these into a single PersistentVolumeClaim to cover both needs, as long as that single volume can simultaneously meet the performance needs demanded by both, outlined below. For the remainder of this section, we will assume two separate volumes, and refer to them by these names.

General notes applicable to both volumes:

  • The anaconda-persistence volume must support ReadWriteMany access mode.

  • The anaconda-storage volume must support either the ReadWriteOnce or ReadWriteMany access mode. For ReadWriteOnce, the three AE5 pods that consume this volume will need to run on the same node: specifically, postgres, git-storage, and object-storage. This is a reasonable configuration in our experience.

  • We strongly recommend a premium performance tier if given the option.

  • The root directories of these volumes must be writable by AE5 containers. This is typically accomplished by making the volumes group writable by a single numeric GID. We strongly recommend that this GID be 0, since this is the default GID assigned to Kubernetes containers. If this is not possible, supply the GID within the Persistent Volume specification as an pv.beta.kubernetes.io/gid annotation.

  • To ensure that the data on these volumes is not lost if AE5 is uninstalled, do not change the ReclaimPolicy from its default value of Retain.

It is extremely common to use an NFS service, served from an cloud file service or on-premise NAS or SAN, to provide one or both of these volumes. We have collected our NFS-specific recommendations in this document.

Ingress and Firewall

Anaconda Enterprise is compatible with most ingress controllers that are commonly deployed on Kubernetes clusters. In particular, on Kubernetes 1.19-1.21, any ingress controller with full support for the networking.k8s.io/v1 Ingress API will enable Anaconda Enterprise to build endpoints for user sessions and deployments.

Because an ingress controller is a cluster-wide resource, Anaconda recommends that the controller be installed and configured prior to the installation of Anaconda Enterprise. However, if the cluster is fully dedicated to our application, our Helm chart can be configured to install a version of the NGINX Ingress controller that is known to operate successfully on multiple Kubernetes variants, including OpenShift. Our only modification to the stock NGINX container is to enable it to run without root privileges.

It is imperative that your cluster configuration and firewall settings allow all TCP traffic between nodes, particularly HTTP, HTTPS, and the standard Postgres and Redis ports. In our experience, many apparently healthy clusters block such inter-node communication, which disrupts the communication between pods that Anaconda Enterprise requires to provision user workloads.

External traffic to Anaconda Enterprise will be funneled entirely through the ingress controller, through the standard HTTPS port 443.

DNS / SSL

Anaconda Enterprise will require the following:

  • A valid, fully-qualified domain name (FQDN) reserved for AE5

  • A DNS record for the above FQDN, as well as wildcard DNS record for its subdomains. Both records must eventually point to the IP address allocated by the ingress controller. If you will be using an existing ingress controller, you may be able to obtain this address in advance of installation—this is ideal. Otherwise, you may need to populate the DNS records with the address after the initial installation is complete.

  • A valid wildcard SSL certificate covering the cluster FQDN and its subdomains. The installation requires both the public certificate and the private key.

  • If the certificate chain includes an intermediate certificate, the public certificate for the intermediate is required.

  • The public root certificate, if the above certificates were created with a private Certificate Authority (CA).

Wildcard DNS records and SSL certificates are required for correct operation of Anaconda Enterprise. Some administrators object to one or both of these requirements. If that is the case, the Anaconda team can speak with your administrators to provide necessary clarity. Often the objection stems from a misunderstanding about the “scope” of the wildcard. That is, they assume we are asking for coverage of *.company.com (say), when in fact we only require *.anaconda.company.com.

Docker images

We strongly recommend that the Anaconda Enterprise Docker images be copied from our source repository on Docker Hub into your internal docker registry, ensuring their availability even if there is an interuption in connectivity to Docker Hub. (Indeed, in an airgapped setting, this will be necessary.) This registry must be accessible from the Kubernetes cluster where AE5 is installed.

Here are the images you will need. A precise manifest, including the exact version numbers, and the credentials required to pull these images from our authenticated repository, will be provided to you prior to installation.

  • The AE5 system images, all from the aedev/ Docker Hub channel:

ae-app-proxy
ae-auth
ae-auth-api
ae-auth-escrow
ae-deploy
ae-docs
ae-editor
ae-git-proxy
ae-git-storage
ae-object-storage
ae-operation-controller
ae-operation-create-project
ae-repository
ae-storage
ae-sync
ae-ui
ae-workspace
ae-wagonwheel
ae-auth-keycloak
ae-nginx-ingress-v1
  • One image each for Postgres and Redis. Currently these are the only images ceritifed for use with AE5:

    Postgres: postgres:12.13
    Redis: redis:7.0.8
    

Note that the Docker images used by Anaconda Enterprise are larger than many Kubernetes administrators are accustomed to. For more background, see Docker image sizes.

GPU Information

This release of Anaconda Enterprise supports up to CUDA 11.6 DataCenter drivers. We have been able to directly test the following GPU cards:

  • Tesla V100

  • Tesla P100

We have not tested the other cards supported by this driver, however we do expect this full list to work with your BYOK8S cluster, provided the proper installation steps are followed.

  • A-Series: NVIDIA A100, NVIDIA A40, NVIDIA A30, NVIDIA A10

  • RTX-Series: RTX 8000, RTX 6000, NVIDIA RTX A6000, NVIDIA RTX A5000, NVIDIA RTX A4000, NVIDIA T1000, NVIDIA T600, NVIDIA T400

  • HGX-Series: HGX A100, HGX-2

  • T-Series: Tesla T4

  • P-Series: Tesla P40, Tesla P6, Tesla P4

  • K-Series: Tesla K80, Tesla K520, Tesla K40c, Tesla K40m, Tesla K40s, Tesla K40st, Tesla K40t, Tesla K20Xm, Tesla K20m, Tesla K20s, Tesla K20c, Tesla K10, Tesla K8

  • M-Class: M60, M40 24GB, M40, M6, M4

As discussed on this page, support for GPUs in Kubernetes is itself a work in progress, and each cloud vendor provides different recommendations. Furthermore, ROSA does not yet support GPUs.