Preparing a Gravity environment for Workbench#

Determining the resource requirements for a Kubernetes cluster depends on a number of different factors, including what type of applications you are going to be running, the number of users that are active at once, and the workloads you will be managing within the cluster. Data Science & AI Workbench’s performance is tightly coupled with the health of your Kubernetes stack, so it is important to allocate enough resources to manage your users workloads. Generally speaking, your system should contain at least 1 CPU, 1GB of RAM, and 5GB of disk space for each project session or deployment.

To install Workbench successfully, your systems must meet or exceed the requirements listed below. Anaconda has created a pre-installation checklist to help prepare you for installation. The checklist verifies that your cluster has the necessary resources are reserved and is ready to install Workbench. Anaconda’s Implementation team will review the checklist with you prior to your installation.

You can initially install Workbench on up to five nodes. Once initial installation is complete, you can add or remove nodes as needed. Anaconda recommends having one master and one worker node per cluster. For more information, see Adding and removing nodes.

For historical information and details regarding Anaconda’s policies related to Gravity, see our Gravity update policy.

Hardware requirements#

Anaconda’s hardware recommendations ensure a reliable and performant Kubernetes cluster.

The following are minimum specifications for the master and worker nodes, as well as the entire cluster.

Master node	Minimum
CPU	16 cores
RAM	64GB
Disk space in /opt/anaconda	500GB
Disk space in /var/lib/gravity	300GB
Disk space in /tmp or $TMPDIR	50GB

Note

Disk space reserved for /var/lib/gravity is utilized as additional space to accommodate upgrades. Anaconda recommends having this available during installation.
The /var/lib/gravity volume must be mounted on local storage. Core components of Kubernetes run from this directory, some of which are extremely intolerant of disk latency. Therefore, Network-Attached Storage (NAS) and Storage Area Network (SAN) solutions are not supported for this volume.
Disk space reserved for /opt/anaconda is utilized for project and package storage (including mirrored packages).
Anaconda recommends that you set up the /opt/anaconda and /var/lib/gravity partitions using Logical Volume Management (LVM) to provide the flexibility needed to accommodate easier future expansion.
Currently /opt and /opt/anaconda must be an ext4 or xfs filesystem, and cannot be an NFS mountpoint. Subdirectories of /opt/anaconda may be mounted through NFS. For more information, see Mounting an external file share.

Warning

Installations of Workbench that utilize an xfs filesystem must support d_type file labeling to work properly. To support d-type file labeling, set ftype=1 by running the following command prior to installing Workbench.

This command will erase all data on the specified device! Make sure you are targeting the correct device and that you have backed up any important data from it before proceeding.

mkfs.xfs -n ftype=1 <PATH/TO/YOUR/DEVICE>

Worker node	Minimum
CPU	16 cores
RAM	64GB
Disk space in /var/lib/gravity	300GB
Disk space in /tmp or $TMPDIR	50GB

Note

When installing Workbench on a system with multiple nodes, verify that the clock of each node is in sync with the others prior to installation. Anaconda recommends using the Network Time Protocol (NTP) to synchronize computer system clocks automatically over a network. For step by step instructions, see How to Synchronize Time with Chrony NTP in Linux.

Disk IOPS requirements#

Master and worker nodes require a minimum of 3000 concurrent Input/Output operations Per Second (IOPS).

Note

Hard disk manufacturers report sequential IOPS, which are different than concurrent IOPS. On-premise installations require servers with disks that support a minimum of 50 sequential IOPS. Anaconda recommends using Solid State Drive (SSD) or better.

Cloud performance requirements#

Requirements for running Workbench in the cloud relate to compute power and disk performance. Make sure your chosen cloud platform meets these minimum specifications:

Amazon Web Services (AWS)

Anaconda recommends an instance type no smaller than m4.4xlarge for both master and worker nodes. You must have a minimum of 3000 IOPS.

Microsoft Azure

Anaconda recommends a VM size of Standard D16s v3 (16 VCPUs, 64 GB memory).

Google Cloud Platform (GCP)

There are no unique requirements for installing Workbench on the Google Cloud Platform.

Operating system requirements#

Workbench currently supports the following Linux versions:

RHEL/CentOS 7.x, 8.x

Ubuntu 16.04

SUSE 12 SP2, 12 SP3, 12 SP5 (Requires you set DefaultTasksMax=infinity in /etc/systemd/system.conf)

Caution

Some versions of the RHEL 8.4 AMI on AWS are bugged due to a combination of a bad ip rule and the networkmanager service. Remove the bad rule and disable the networkmanager service prior to installation.

Security requirements#

If your Linux system utilizes an antivirus scanner, make sure the scanner excludes the /var/lib/gravity volume from its security scans.
Installation requires that you have sudo access.
Nodes running CentOS or RHEL must make sure that Security Enhanced Linux (SELinux) set to either disabled or permissive mode in the /etc/selinux/config file.
Tip

Check the status of SELinux by running the following command:
getenforce
Configuring SELinux
Open the /etc/selinux/config file using your preferred file editor.

Find the the line that starts with SELINUX= and set it to either disabled or permissive.

Save and close the file.

Reboot your system for changes to take effect.

Kernel module requirements#

Kubernetes relies on certain functionalities provided by the Linux kernel. The Workbench installer verifies that the following kernel modules (required for Kubernetes to function properly) are present, and notifies you if any are not loaded.

Linux Distribution	Version	Required Modules
CentOS	7.2	bridge, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay
CentOS	7.3-7.7, 8.0	br_netfilter, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay
RedHat Linux	7.2	bridge, ebtable_filter, ebtables, iptable_filter, iptable_nat
RedHat Linux	7.3-7.7, 8.0	br_netfilter, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay
Ubuntu	16.04	br_netfilter, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay
Suse	12 SP2, 12 SP3, 12 SP5	br_netfilter, ebtable_filter, ebtables, iptable_filter, iptable_nat, overlay

Module Name	Purpose
bridge	Enables Kubernetes iptables-based proxy to operate
br_nerfilter	Enables Kubernetes iptables-based proxy to operate
overlay	Enables the use of the overlay or overlay2 Docker storage driver
ebtable_filter	Allows a service to communicate back to itself via internal load balancing when necessary
ebtables	Allows a service to communicate back to itself via internal load balancing when necessary
iptable_filter	Ensures the firewall rules set up by Kubernetes function properly
iptable_nat	Ensures the firewall rules set up by Kubernetes function properly

Note

Verify a module is loaded by running the following command:

# Replace <MODULE_NAME> with a module name
lsmod | grep <MODULE_NAME>

If the command produces a return, the module is loaded.

If necessary, run the the following command to load a module:

# Replace <MODULE_NAME> with a module name
sudo modprobe <MODULE_NAME>

Caution

If your system does not load modules at boot, you must run the following command—for each module—to ensure they are loaded on every reboot:

# Replace <MODULE_NAME> with a module name
sudo echo -e '<MODULE_NAME>' > /etc/modules-load.d/<MODULE_NAME>.conf

System control settings#

Workbench requires the following Linux sysctl settings to function properly:

sysctl setting	Purpose
net.bridge.bridge-nf-call-iptables	Communicates with bridge kernel module to ensure Kubernetes iptables-based proxy operates
net.bridge.bridge-nf-call-ip6tables	Communicates with bridge kernel module to ensure Kubernetes iptables-based proxy operates
fs.may_detach_mounts	Allows the unmount operation to complete even if there are active references to the filesystem remaining
net.ipv4.ip_forward	Required for internal load balancing between servers to work properly
fs.inotify.max_user_watches	Set to 1048576 to improve cluster longevity

Note

If necessary, run the following command to enable a system control setting:

# Replace <SYSCTL_SETTING> with a system control setting
sudo sysctl -w <SYSCTL_SETTING>=1

To persist system settings on boot, run the following for each setting:

# Replace  <SYSCTL_SETTING> with a system control setting
sudo echo -e "<SYSCTL_SETTING> = 1" > /etc/sysctl.d/10-<SYSCTL_SETTING>.conf

GPU requirements#

Workbench requires that you install a supported version of the NVIDIA Compute Unified Device Architecture (CUDA) driver on the host OS of any GPU worked node.

Currently, Workbench supports the following CUDA driver versions:

CUDA 10.2
CUDA 11.2
CUDA 11.4
CUDA 11.6

Note

Notify your Anaconda Implementation team member which CUDA version you intend to use, so they can provide the correct installer.

You can obtain the driver you need a few different ways.

Use the package manager or the Nvidia runfile to download the file directly.
For SLES, CentOS, and RHEL, you can get a supported driver using rpm (local) or rpm (network).
For Ubunutu, you can get a driver using deb (local) or deb (network).

GPU deployments should use one of the following models:

Tesla V100 (recommended)
Tesla P100 (adequate)

Theoretically, Workbench will work with any GPU card compatible with the CUDA drivers, as long as they are properly installed. Other cards supported by CUDA 11.6:

A-Series: NVIDIA A100, NVIDIA A40, NVIDIA A30, NVIDIA A10
RTX-Series: RTX 8000, RTX 6000, NVIDIA RTX A6000, NVIDIA RTX A5000, NVIDIA RTX A4000, NVIDIA T1000, NVIDIA T600, NVIDIA T400
HGX-Series: HGX A100, HGX-2
T-Series: Tesla T4
P-Series: Tesla P40, Tesla P6, Tesla P4
K-Series: Tesla K80, Tesla K520, Tesla K40c, Tesla K40m, Tesla K40s, Tesla K40st, Tesla K40t, Tesla K20Xm, Tesla K20m, Tesla K20s, Tesla K20c, Tesla K10, Tesla K8
M-Class: M60, M40 24GB, M40, M6, M4

Support for GPUs in Kubernetes is still a work in progress, and each cloud vendor provides different recommendations. For more information about GPUs, see Understanding GPUs.

Network requirements#

Workbench requires the following network ports to be externally accessible:

These ports need to be externally accessible during installation only, and can be closed after completing the install process:

Install ports

Port

Protocol

Description

4242

TCP

Bandwidth checker utility

61009

TCP

Install wizard UI access required during cluster installation

61008, 61010, 61022-61024

TCP

Installer agent ports

The following ports are used for cluster operation, and must be open internally, between cluster nodes:

Cluster communication ports

Port

Protocol

Description

53

TCP and UDP

Internal cluster DNS

2379, 2380, 4001, 7001

TCP

Etcd server communication

3008-3012

TCP

Internal Workbench service

3022-3025

TCP

Teleport internal SSH control panel

3080

TCP

Teleport Web UI

5000

TCP

Docker registry

6443

TCP

Kubernetes API Server

6990

TCP

Internal Workbench service

7496, 7373

TCP

Peer-to-peer health check

7575

TCP

Cluster status gRPC API

8081, 8086-8091, 8095

TCP

Internal Workbench service

8472

UDP

Overlay network

9080, 9090, 9091

TCP

Internal Workbench service

10248-10250, 10255

TCP

Kubernetes components

30000-32767

TCP

Kubernetes internal services range

Make sure that the firewall is permanently set to keep the required ports open, and will save these settings across reboots. Then restart the firewall to load your changed settings.

Tip

There are various tools you can use to configure firewalls and open required ports, including iptables, firewall-cmd, susefirewall2, and more!

You’ll also need to update your firewall settings to ensure that the 10.244.0.0/16 pod subnet and 10.100.0.0/16 service subnet are accessible to every node in the cluster, and grant all nodes the ability to communicate via their primary interface.

For example, if you’re using iptables:

# Replace <NODE_IP> with the internal IP address(es) used by all nodes in the cluster to connect to the master node
iptables -A INPUT -s 10.244.0.0/16 -j ACCEPT
iptables -A INPUT -s 10.100.0.0/16 -j ACCEPT
iptables -A INPUT -s <NODE_IP> -j ACCEPT

If you plan to use online package mirroring, allowlist the following domains in your network’s firewall settings:

repo.anaconda.com
anaconda.org
conda.anaconda.org
binstar-cio-packages-prod.s3.amazonaws.com

To use Workbench in conjucntion with Anaconda Navigator in online mode, allowlist the following sites in your network’s firewall settings as well:

https://repo.anaconda.com (or for older versions of Navigator and conda)
https://conda.anaconda.org if any users will use conda-forge and other channels on Anaconda.org
google-public-dns-a.google.com (8.8.8.8:53) to check internet connectivity with Google Public DNS

TLS/SSL certificate requirements#

Workbench uses certificates to provide transport layer security for the cluster. Self-signed certificates are generated during the initial installation. Once installation is complete, you can configure the platform to use your organizational TLS/SSL certificates.

You can purchase certificates commercially, or generate them using your organization’s internal public key infrastructure (PKI) system. When using an internal PKI-signed setup, the CA certificate is inserted into the Kubernetes secret.

In either case, the configuration will include the following:

A certificate for the root certificate authority (CA)
An intermediate certificate chain
A server certificate
A certificate private key

For more information about TLS/SSL certificates, see Updating TLS/SSL certificates.

DNS requirements#

Workbench assigns unique URL addresses to deployments by combining a dynamically generated universally unique identifier (UUID) with your organization’s domain name, like this: https://uuid001.anaconda.yourdomain.com.

This requires the use of wildcard DNS entries that apply to a set of domain names such as *.anaconda.yourdomain.com.

For example, if you are using the domain name anaconda.yourdomain.com with a master node IP address of 12.34.56.78, the DNS entries would be as follows:

anaconda.yourdomain.com IN A 12.34.56.78
*.anaconda.yourdomain.com IN A 12.34.56.78

Note

The wildcard subdomain’s DNS entry points to the Workbench master node.

The master node’s hostname and the wildcard domains must be resolvable with DNS from the master nodes, worker nodes, and the end user machines. To ensure the master node can resolve its own hostname, distribute any /etc/hosts entries to the gravity environment.

Caution

If dnsmasq is installed on the master node or any worker nodes, you’ll need to remove it from all nodes prior to installing Workbench.

Verify dnsmasq is disabled by running the following command:

sudo systemctl status dnsmasq

If necessary, stop and disable dnsmasq, run the following commands:

sudo systemctl stop dnsmasq
sudo systemctl disable dnsmasq

Browser requirements#

Workbench supports the following web browsers:

Chrome 39+
Firefox 49+
Safari 10+

The minimum browser screen size for using the platform is 800 pixels wide and 600 pixels high.

Verifying system requirements#

The installer performs pre-installation checks, and only allows installation to continue on nodes that are configured correctly, and include the required kernel modules. If you want to perform the system check yourself prior to installation, you can run the following commands from the installer directory, ~/anaconda-enterprise-<VERSION>, on your intended master and worker nodes:

To perform system checks on the master node, run the following command as sudo or root user:

sudo ./gravity check --profile ae-master

To perform system checks on a worker node, run the following command as sudo or root user:

sudo ./gravity check --profile ae-worker

If all of the system checks pass and all requirements are met, the output from the above commands will be empty. If the system checks fail and some requirements are not met, the output will indicate which system checks failed.

Port	Protocol	Description
80	TCP	Workbench UI (plaintext)
443	TCP	Workbench UI (encrypted)
32009	TCP	Operations Center Admin UI

Port	Protocol	Description
4242	TCP	Bandwidth checker utility
61009	TCP	Install wizard UI access required during cluster installation
61008, 61010, 61022-61024	TCP	Installer agent ports

Port	Protocol	Description
53	TCP and UDP	Internal cluster DNS
2379, 2380, 4001, 7001	TCP	Etcd server communication
3008-3012	TCP	Internal Workbench service
3022-3025	TCP	Teleport internal SSH control panel
3080	TCP	Teleport Web UI
5000	TCP	Docker registry
6443	TCP	Kubernetes API Server
6990	TCP	Internal Workbench service
7496, 7373	TCP	Peer-to-peer health check
7575	TCP	Cluster status gRPC API
8081, 8086-8091, 8095	TCP	Internal Workbench service
8472	UDP	Overlay network
9080, 9090, 9091	TCP	Internal Workbench service
10248-10250, 10255	TCP	Kubernetes components
30000-32767	TCP	Kubernetes internal services range

Preparing a Gravity environment for Workbench#

Hardware requirements#

Disk IOPS requirements#

Cloud performance requirements#

Operating system requirements#

Security requirements#

Kernel module requirements#

System control settings#

GPU requirements#

Network requirements#

TLS/SSL certificate requirements#

DNS requirements#

Browser requirements#

Verifying system requirements#

Pre-installation checklist#