Preparing a K3s environment for Workbench#

Determining the resource requirements for a Kubernetes cluster depends on a number of different factors, including what type of applications you will be running, the number of users that are active at once, and the workloads you will be managing within the cluster. Data Science & AI Workbench’s performance is tightly coupled with the health of your Kubernetes stack, so it is important to allocate enough resources to manage your users’ workloads. Generally speaking, your system should contain at least 1 CPU, 1GB of RAM, and 5GB of disk space for each project session or deployment.

Hardware requirements#

Anaconda’s hardware recommendations ensure a reliable and performant Kubernetes cluster.

The following are minimum specifications for the control plane and worker nodes, as well as the entire cluster.

Control plane node

Minimum

CPU

16 cores

RAM

64GB

Disk space in /opt/anaconda

500GB

Disk space in /var/lib/rancher

300GB

Disk space in /tmp or $TMPDIR

50GB

Note

  • Disk space reserved for /var/lib/rancher is utilized as additional space to accommodate upgrades. Anaconda recommends having this available during installation.

  • The /var/lib/rancher volume must be mounted on local storage. Core components of Kubernetes run from this directory, some of which are extremely intolerant of disk latency. Therefore, Network-Attached Storage (NAS) and Storage Area Network (SAN) solutions are not supported for this volume.

  • Anaconda recommends that you set up the /opt/anaconda and /var/lib/rancher partitions using Logical Volume Management (LVM) to provide the flexibility needed to accommodate future expansion.

  • Disk space reserved for /opt/anaconda is utilized for project and package storage (including mirrored packages).

Worker node

Minimum

CPU

16 cores

RAM

64GB

Disk space in /var/lib/rancher

300GB

Disk space in /tmp or $TMPDIR

50GB

Note

When installing Workbench on a system with multiple nodes, verify that the clock of each node is in sync with the others prior to installation. Anaconda recommends using the Network Time Protocol (NTP) to synchronize computer system clocks automatically over a network. For step-by-step instructions, see How to Synchronize Time with Chrony NTP in Linux.

Disk IOPS requirements#

Nodes require a minimum of 3000 concurrent Input/Output operations Per Second (IOPS).

Note

Solid state disks are strongly recommended for optimal performance.

Cloud performance requirements#

Requirements for running Workbench in the cloud relate to compute power and disk performance.

Minimum specifications:
  • CPU: 8 vCPU

  • Memory: 32GB RAM

Recommended specifications:
  • CPU: 16 vCPU

  • Memory: 64GB RAM

Operating system requirements#

Please see the official K3s documentation for information on supported operating systems.

Caution

  • You must remove Docker or Podman from the server, if present.

Security requirements#

  • If your Linux system utilizes an antivirus scanner, ensure that the scanner excludes the /var/lib/rancher volume from its security scans.

  • Installation requires that you have sudo access.

  • RHEL instances must disable nm-cloud-setup.

    Disabling nm-cloud-setup

    Disable nm-cloud-setup by running the following command:

    systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
    
  • Nodes running CentOS or RHEL must ensure that Security Enhanced Linux (SELinux) is set to either disabled or permissive mode in the /etc/selinux/config file.

    Tip

    Check the status of SELinux by running the following command:

    getenforce
    
    Configuring SELinux
    1. Open the /etc/selinux/config file using your preferred file editor.

    2. Find the line that starts with SELINUX= and set it to either disabled or permissive.

    3. Save and close the file.

    4. Reboot your system for changes to take effect.

Network requirements#

Please see the official K3s documentation regarding network requirements.

Firewall Requirements#

It is recommended to remove OS-level firewalls altogether. If that is not possible, review the K3s requirements on how to configure the firewall for your OS.

Mirroring with a firewall

If you plan to use online package mirroring, allowlist the following domains in your network’s firewall settings:

  • repo.anaconda.com

  • anaconda.org

  • conda.anaconda.org

  • binstar-cio-packages-prod.s3.amazonaws.com

To use Workbench in conjunction with Anaconda Navigator in online mode, allowlist the following sites in your network’s firewall settings as well:

  • https://repo.anaconda.com — For use of older versions of Navigator and conda

  • https://conda.anaconda.org — For use of conda-forge and other channels on Anaconda.org

  • google-public-dns-a.google.com (8.8.8.8:53) — To check internet connectivity with Google Public DNS.

TLS/SSL certificate requirements#

Workbench uses certificates to provide transport layer security for the cluster. Self-signed certificates are generated during the initial installation. Once installation is complete, you can configure the platform to use your organizational TLS/SSL certificates.

You can purchase certificates commercially or generate them using your organization’s internal public key infrastructure (PKI) system. When using an internal PKI-signed setup, the CA certificate is inserted into the Kubernetes secret.

In either case, the configuration will include the following:

  • A certificate for the root certificate authority (CA)

  • An intermediate certificate chain

  • A server certificate

  • A certificate private key

For more information about TLS/SSL certificates, see Updating TLS/SSL certificates.

DNS requirements#

Workbench assigns unique URL addresses to deployments by combining a dynamically generated universally unique identifier (UUID) with your organization’s domain name, like this: https://uuid001.anaconda.yourdomain.com.

This requires the use of wildcard DNS entries that apply to a set of domain names such as *.anaconda.yourdomain.com.

For example, if you are using the domain name anaconda.yourdomain.com with a control plane node IP address of 12.34.56.78, the DNS entries would be as follows:

anaconda.yourdomain.com IN A 12.34.56.78
*.anaconda.yourdomain.com IN A 12.34.56.78

Note

The wildcard subdomain’s DNS entry points to the Workbench control plane node.

The control plane node’s hostname and the wildcard domains must be resolvable with DNS from the control plane node, worker nodes, and the end user’s machines. To ensure the control plane node can resolve its own hostname, distribute any /etc/hosts entries to the K3s environment.

Caution

If dnsmasq is installed on the control plane node or any worker nodes, you’ll need to remove it from all nodes prior to installing Workbench.

Verify dnsmasq is disabled by running the following command:

sudo systemctl status dnsmasq

If necessary, stop and disable dnsmasq by running the following commands:

sudo systemctl stop dnsmasq
sudo systemctl disable dnsmasq

Helm chart#

Helm is a tool used by Workbench to streamline the creation, packaging, configuration, and deployment of the application’s configurations. It combines all of the config map objects into a single reusable package called a Helm chart. This chart contains all the necessary resources to deploy the application within your cluster. These resources include .yaml configuration files, services, secrets, and config maps.

For K3s, Workbench includes a values.k3s.yaml file that overrides the default values in the top-level Helm chart. Make additions and modifications to this file with your current cluster configurations at this time.

Note

These default configurations are meant for a single-tenant cluster. If you are utilizing a multi-tenant cluster, modify the rbac parameters where present to scope to the namespace only.

Helm values.k3s.yaml template

Note

This template is heavily commented to guide you through the parameters that usually require modification.

# This values.yaml template is intended to be customized
# for each installation. Its values *augment and override*
# the default values found in Anaconda-Enterprise/values.yaml.

global:
  # global.hostname -- The fully qualified domain name (FQDN) of the cluster.
  # @section -- Global Common Parameters
  hostname: "anaconda.example.com"

  # global.version -- (string) The application version; defaults to `Chart.AppVersion`.
  # @section -- Global Common Parameters
  version:

  # Uncomment for OpenShift only
  # dnsServer: dns-default.openshift-dns.svc.cluster.local

  # The UID under which to run the containers (required)
  runAsUser: 1000

  # Docker registry information
  image:
    # Repository for Workbench images.
    # Trailing slash required if not empty
    server: "aedev/"
    # A single pull secret name, or a list of names, as required
    pullSecrets:

  # Global Service Account Settings
  serviceAccount:
    # global.serviceAccount.name -- Service account name
    # @section -- Global RBAC Parameters
    name: "anaconda-enterprise"

# If the DNS record for the hostname above resolves to an
# address inaccessible from the cluster, supply a valid
# IP address for the ingress or load balancer here.
privateIP: ""

# rbac
serviceAccount:
  # serviceAccount.create -- Controls the creation of the service account
  # @section -- RBAC Parameters
  create: true
rbac:
  # rbac.create -- Controls the creation and binding of rbac resources. This excludes ingress.
  # See `.Values.ingress.install` for additional details on managing rbac for that resource
  # type.
  # @section -- RBAC Parameters
  create: true

# generateCerts -- Generate Self-Signed Certificates.
# `load`: use the certificates in Anaconda-Enterprise/certs.
# `skip`: do nothing; assume the secrets already exist.
# Existing secrets are always preserved during upgrades.
# @section -- TLS / SSL Secret Management
generateCerts: "generate"

# Keycloak LDAPS Settings
# truststore: path to your truststore file containing custom CA cert
# truststore_password: password of the truststore
# truststore_seret: name of secret used for the truststore such as anaconda-enterprise-truststore
keycloak:
  # keycloak.truststore -- Java Truststore File
  # @section -- Keycloak Parameters
  truststore: ""

  # keycloak.truststore_password -- Java Truststore Password
  # @section -- Keycloak Parameters
  truststore_password: ""

  # keycloak.truststore_secret -- Java Truststore Secret
  # @section -- Keycloak Parameters
  truststore_secret: ""

  # keycloak.tempUsername --
  # Important note: these have an effect only during
  # initial installation. If an administrative user
  # already exists, these values are ignored.
  # @section -- Keycloak Parameters
  tempUsername: "admin"

  # keycloak.tempPassword --
  # Important note: these have an effect only during
  # initial installation. If an administrative user
  # already exists, these values are ignored.
  # @section -- Keycloak Parameters
  tempPassword: "admin"

ingress:
  # ingress.className -- (string) If an existing ingress controller is being used, this
  # must match the ingress.className of that controller.
  # Cannot be empty if ingress.install is true.
  # @section -- Ingress Parameters
  className: "traefik"

  # ingress.install -- Ingress Install Control.
  # `false`: an existing ingress controller will be used.
  # `true`: install an ingress controller in this namespace.
  # @section -- Ingress Parameters
  install: false

  # ingress.installClass -- IngressClass Install Control.
  # `false`: an existing IngressClass resource will be used.
  # `true`: create a new IngressClass in the global namespace.
  # Ignored if ingress.install is `false`.
  # @section -- Ingress Parameters
  installClass: false

  # ingress.labels -- `.metadata.labels` for the ingress.
  # If your ingress controller requires custom labels to be
  # added to ingress entries, list them here as a dictionary
  # of key/value pairs.
  # @section -- Ingress Parameters
  labels: {}

  # If your ingress requires custom annotations to be added
  # to ingress entries, they can be included here. These
  # will be added to any existing annotations in the chart.
  # For all ingress entries
  global: {}
  # For the master ingress only
  system: {}
  # For sessions and deployments only
  user: {}

# To configure an external Git repository, uncomment this section and fill
# in the relevant values. For more details, consult this page:
# https://enterprise-docs.anaconda.com/en/latest/admin/advanced/config-repo.html
#
# git:
#   type: github-v3-api
#   name: Github.com Repo
#   url: https://api.github.com/
#   credential-url: https://api.github.com/anaconda-test-org
#   organization: anaconda-test-org
#   repository: {owner}-{id}
#   username: somegituser
#   auth-token: 98bcf2261707794b4a56f24e23fd6ed771d6c742
#   http-timeout: 60
#   disable-tls-verification: false
#   create-args: {}

# As discussed in the documentation, you may use the same
# persistent volume for both storage resources. If so, make
# sure to use the same pvc: value in both locations.
storage:
  create: true
  pvc: "anaconda-storage"
persistence:
  pvc: "anaconda-storage"

# TOLERATIONS / AFFINITY
# Please work with the Anaconda team for assistance
# to configure these settings if you need them.

tolerations:
  # For all pods
  global: []
  # For system pods, except the ingress
  system: []
  # For the ingress daemonset alone
  ingress: []
  # For user pods
  user: []

affinity:
  # For all pods
  global: {}
  # For system pods, except the ingress
  system: {}
  # For the ingress daemonset alone
  ingress: {}
  # For user pods
  user: {}

# By default, all ops services are enabled for k3s installations.
# Consult the documentation for details on how to configure each service.

opsDashboard:
  enabled: true
opsMetrics:
  enabled: true
opsGrafana:
  enabled: true

Pre-installation checklist#

Anaconda has created this pre-installation checklist to help you verify that you have properly prepared your environment prior to installation.

K3s pre-installation checklist

All nodes in the cluster meet the minimum or recommended specifications for CPU, RAM, and disk space.

All nodes in the cluster meet the minimum IOPS required for reliable performance.

All cluster nodes are operating the same OS version, and the OS is supported.

NTP is being used to synchronize computer system clocks, and all nodes are in sync.

The user account performing the installation has sudo access on all nodes and is not a root user.

The system meets all K3s network requirements.

The firewall is either disabled or configured correctly.

If necessary, the domains required for online package mirroring have been allowlisted.

The final TLS/SSL certificates <k3s_tls_ssl_reqs> to be installed with Workbench have been obtained, including the private keys.

The Workbench A or CNAME domain record is fully operational and points to the IP address of the control plane node.

The wildcard DNS entry for Workbench is also fully operational and points to the IP address of the control plane node. More information about the wildcard DNS requirements can be found here.

The /etc/resolv.conf file on all the nodes does not include the rotate option.

Any existing installations of Docker (and dockerd), dnsmasq, and lxd have been removed from all nodes, as they will conflict with Workbench.