Preparing a K3s environment for Workbench#
Determining the resource requirements for a Kubernetes cluster depends on a number of different factors, including what type of applications you will be running, the number of users that are active at once, and the workloads you will be managing within the cluster. Data Science & AI Workbench’s performance is tightly coupled with the health of your Kubernetes stack, so it is important to allocate enough resources to manage your users’ workloads. Generally speaking, your system should contain at least 1 CPU, 1GB of RAM, and 5GB of disk space for each project session or deployment.
Hardware requirements#
Anaconda’s hardware recommendations ensure a reliable and performant Kubernetes cluster.
The following are minimum specifications for the control plane and worker nodes, as well as the entire cluster.
Control plane node
Minimum
CPU
16 cores
RAM
64GB
Disk space in
/opt/anaconda
500GB
Disk space in
/var/lib/rancher
300GB
Disk space in
/tmp
or$TMPDIR
50GB
Note
Disk space reserved for
/var/lib/rancher
is utilized as additional space to accommodate upgrades. Anaconda recommends having this available during installation.The
/var/lib/rancher
volume must be mounted on local storage. Core components of Kubernetes run from this directory, some of which are extremely intolerant of disk latency. Therefore, Network-Attached Storage (NAS) and Storage Area Network (SAN) solutions are not supported for this volume.Anaconda recommends that you set up the
/opt/anaconda
and/var/lib/rancher
partitions using Logical Volume Management (LVM) to provide the flexibility needed to accommodate future expansion.Disk space reserved for
/opt/anaconda
is utilized for project and package storage (including mirrored packages).
Worker node
Minimum
CPU
16 cores
RAM
64GB
Disk space in
/var/lib/rancher
300GB
Disk space in
/tmp
or$TMPDIR
50GB
Note
When installing Workbench on a system with multiple nodes, verify that the clock of each node is in sync with the others prior to installation. Anaconda recommends using the Network Time Protocol (NTP) to synchronize computer system clocks automatically over a network. For step-by-step instructions, see How to Synchronize Time with Chrony NTP in Linux.
Disk IOPS requirements#
Nodes require a minimum of 3000 concurrent Input/Output operations Per Second (IOPS).
Note
Solid state disks are strongly recommended for optimal performance.
Cloud performance requirements#
Requirements for running Workbench in the cloud relate to compute power and disk performance.
- Minimum specifications:
CPU: 8 vCPU
Memory: 32GB RAM
- Recommended specifications:
CPU: 16 vCPU
Memory: 64GB RAM
Operating system requirements#
Please see the official K3s documentation for information on supported operating systems.
Caution
You must remove Docker or Podman from the server, if present.
Security requirements#
If your Linux system utilizes an antivirus scanner, ensure that the scanner excludes the
/var/lib/rancher
volume from its security scans.Installation requires that you have
sudo
access.RHEL instances must disable
nm-cloud-setup
.Disabling nm-cloud-setup
Disable
nm-cloud-setup
by running the following command:systemctl disable nm-cloud-setup.service nm-cloud-setup.timer
Nodes running CentOS or RHEL must ensure that Security Enhanced Linux (SELinux) is set to either
disabled
orpermissive
mode in the/etc/selinux/config
file.Tip
Check the status of SELinux by running the following command:
getenforce
Configuring SELinux
Open the
/etc/selinux/config
file using your preferred file editor.Find the line that starts with
SELINUX=
and set it to eitherdisabled
orpermissive
.Save and close the file.
Reboot your system for changes to take effect.
Network requirements#
Please see the official K3s documentation regarding network requirements.
Firewall Requirements#
It is recommended to remove OS-level firewalls altogether. If that is not possible, review the K3s requirements on how to configure the firewall for your OS.
Mirroring with a firewall
If you plan to use online package mirroring, allowlist the following domains in your network’s firewall settings:
repo.anaconda.com
anaconda.org
conda.anaconda.org
binstar-cio-packages-prod.s3.amazonaws.com
To use Workbench in conjunction with Anaconda Navigator in online mode, allowlist the following sites in your network’s firewall settings as well:
https://repo.anaconda.com
— For use of older versions of Navigator and conda
https://conda.anaconda.org
— For use of conda-forge and other channels on Anaconda.org
google-public-dns-a.google.com
(8.8.8.8:53
) — To check internet connectivity with Google Public DNS.
TLS/SSL certificate requirements#
Workbench uses certificates to provide transport layer security for the cluster. Self-signed certificates are generated during the initial installation. Once installation is complete, you can configure the platform to use your organizational TLS/SSL certificates.
You can purchase certificates commercially or generate them using your organization’s internal public key infrastructure (PKI) system. When using an internal PKI-signed setup, the CA certificate is inserted into the Kubernetes secret.
In either case, the configuration will include the following:
A certificate for the root certificate authority (CA)
An intermediate certificate chain
A server certificate
A certificate private key
For more information about TLS/SSL certificates, see Updating TLS/SSL certificates.
DNS requirements#
Workbench assigns unique URL addresses to deployments by combining a dynamically generated universally unique identifier (UUID) with your organization’s domain name, like this: https://uuid001.anaconda.yourdomain.com
.
This requires the use of wildcard DNS entries that apply to a set of domain names such as *.anaconda.yourdomain.com
.
For example, if you are using the domain name anaconda.yourdomain.com
with a control plane node IP address of 12.34.56.78
, the DNS entries would be as follows:
anaconda.yourdomain.com IN A 12.34.56.78
*.anaconda.yourdomain.com IN A 12.34.56.78
Note
The wildcard subdomain’s DNS entry points to the Workbench control plane node.
The control plane node’s hostname and the wildcard domains must be resolvable with DNS from the control plane node, worker nodes, and the end user’s machines. To ensure the control plane node can resolve its own hostname, distribute any /etc/hosts
entries to the K3s environment.
Caution
If dnsmasq
is installed on the control plane node or any worker nodes, you’ll need to remove it from all nodes prior to installing Workbench.
Verify dnsmasq
is disabled by running the following command:
sudo systemctl status dnsmasq
If necessary, stop and disable dnsmasq
by running the following commands:
sudo systemctl stop dnsmasq
sudo systemctl disable dnsmasq
Helm chart#
Helm is a tool used by Workbench to streamline the creation, packaging, configuration, and deployment of the application’s configurations. It combines all of the config map objects into a single reusable package called a Helm chart. This chart contains all the necessary resources to deploy the application within your cluster. These resources include .yaml
configuration files, services, secrets, and config maps.
For K3s, Workbench includes a values.k3s.yaml
file that overrides the default values in the top-level Helm chart. Make additions and modifications to this file with your current cluster configurations at this time.
Note
These default configurations are meant for a single-tenant cluster. If you are utilizing a multi-tenant cluster, modify the rbac
parameters where present to scope to the namespace
only.
Helm values.k3s.yaml
template
Note
This template is heavily commented to guide you through the parameters that usually require modification.
# This values.yaml template is intended to be customized # for each installation. Its values *augment and override* # the default values found in Anaconda-Enterprise/values.yaml. global: # global.hostname -- The fully qualified domain name (FQDN) of the cluster. # @section -- Global Common Parameters hostname: "anaconda.example.com" # global.version -- (string) The application version; defaults to `Chart.AppVersion`. # @section -- Global Common Parameters version: # Uncomment for OpenShift only # dnsServer: dns-default.openshift-dns.svc.cluster.local # The UID under which to run the containers (required) runAsUser: 1000 # Docker registry information image: # Repository for Workbench images. # Trailing slash required if not empty server: "aedev/" # A single pull secret name, or a list of names, as required pullSecrets: # Global Service Account Settings serviceAccount: # global.serviceAccount.name -- Service account name # @section -- Global RBAC Parameters name: "anaconda-enterprise" # If the DNS record for the hostname above resolves to an # address inaccessible from the cluster, supply a valid # IP address for the ingress or load balancer here. privateIP: "" # rbac serviceAccount: # serviceAccount.create -- Controls the creation of the service account # @section -- RBAC Parameters create: true rbac: # rbac.create -- Controls the creation and binding of rbac resources. This excludes ingress. # See `.Values.ingress.install` for additional details on managing rbac for that resource # type. # @section -- RBAC Parameters create: true # generateCerts -- Generate Self-Signed Certificates. # `load`: use the certificates in Anaconda-Enterprise/certs. # `skip`: do nothing; assume the secrets already exist. # Existing secrets are always preserved during upgrades. # @section -- TLS / SSL Secret Management generateCerts: "generate" # Keycloak LDAPS Settings # truststore: path to your truststore file containing custom CA cert # truststore_password: password of the truststore # truststore_seret: name of secret used for the truststore such as anaconda-enterprise-truststore keycloak: # keycloak.truststore -- Java Truststore File # @section -- Keycloak Parameters truststore: "" # keycloak.truststore_password -- Java Truststore Password # @section -- Keycloak Parameters truststore_password: "" # keycloak.truststore_secret -- Java Truststore Secret # @section -- Keycloak Parameters truststore_secret: "" # keycloak.tempUsername -- # Important note: these have an effect only during # initial installation. If an administrative user # already exists, these values are ignored. # @section -- Keycloak Parameters tempUsername: "admin" # keycloak.tempPassword -- # Important note: these have an effect only during # initial installation. If an administrative user # already exists, these values are ignored. # @section -- Keycloak Parameters tempPassword: "admin" ingress: # ingress.className -- (string) If an existing ingress controller is being used, this # must match the ingress.className of that controller. # Cannot be empty if ingress.install is true. # @section -- Ingress Parameters className: "traefik" # ingress.install -- Ingress Install Control. # `false`: an existing ingress controller will be used. # `true`: install an ingress controller in this namespace. # @section -- Ingress Parameters install: false # ingress.installClass -- IngressClass Install Control. # `false`: an existing IngressClass resource will be used. # `true`: create a new IngressClass in the global namespace. # Ignored if ingress.install is `false`. # @section -- Ingress Parameters installClass: false # ingress.labels -- `.metadata.labels` for the ingress. # If your ingress controller requires custom labels to be # added to ingress entries, list them here as a dictionary # of key/value pairs. # @section -- Ingress Parameters labels: {} # If your ingress requires custom annotations to be added # to ingress entries, they can be included here. These # will be added to any existing annotations in the chart. # For all ingress entries global: {} # For the master ingress only system: {} # For sessions and deployments only user: {} # To configure an external Git repository, uncomment this section and fill # in the relevant values. For more details, consult this page: # https://enterprise-docs.anaconda.com/en/latest/admin/advanced/config-repo.html # # git: # type: github-v3-api # name: Github.com Repo # url: https://api.github.com/ # credential-url: https://api.github.com/anaconda-test-org # organization: anaconda-test-org # repository: {owner}-{id} # username: somegituser # auth-token: 98bcf2261707794b4a56f24e23fd6ed771d6c742 # http-timeout: 60 # disable-tls-verification: false # create-args: {} # As discussed in the documentation, you may use the same # persistent volume for both storage resources. If so, make # sure to use the same pvc: value in both locations. storage: create: true pvc: "anaconda-storage" persistence: pvc: "anaconda-storage" # TOLERATIONS / AFFINITY # Please work with the Anaconda team for assistance # to configure these settings if you need them. tolerations: # For all pods global: [] # For system pods, except the ingress system: [] # For the ingress daemonset alone ingress: [] # For user pods user: [] affinity: # For all pods global: {} # For system pods, except the ingress system: {} # For the ingress daemonset alone ingress: {} # For user pods user: {} # By default, all ops services are enabled for k3s installations. # Consult the documentation for details on how to configure each service. opsDashboard: enabled: true opsMetrics: enabled: true opsGrafana: enabled: true
Pre-installation checklist#
Anaconda has created this pre-installation checklist to help you verify that you have properly prepared your environment prior to installation.
K3s pre-installation checklist
All nodes in the cluster meet the minimum or recommended specifications for CPU, RAM, and disk space.
All nodes in the cluster meet the minimum IOPS required for reliable performance.
All cluster nodes are operating the same OS version, and the OS is supported.
NTP is being used to synchronize computer system clocks, and all nodes are in sync.
The user account performing the installation has
sudo
access on all nodes and is not a root user.The system meets all K3s network requirements.
The firewall is either disabled or configured correctly.
If necessary, the domains required for online package mirroring have been allowlisted.
The final TLS/SSL certificates <k3s_tls_ssl_reqs> to be installed with Workbench have been obtained, including the private keys.
The Workbench
A
orCNAME
domain record is fully operational and points to the IP address of the control plane node.The wildcard DNS entry for Workbench is also fully operational and points to the IP address of the control plane node. More information about the wildcard DNS requirements can be found here.
The
/etc/resolv.conf
file on all the nodes does not include therotate
option.Any existing installations of Docker (and
dockerd
),dnsmasq
, andlxd
have been removed from all nodes, as they will conflict with Workbench.