Managing cluster resources#
After you’ve installed an Anaconda Enterprise cluster, you’ll need to continue managing and monitoring the cluster to ensure that it scales with your organization. These on-going tasks include:
Adding nodes: If you outgrow your initial Anaconda Enterprise cluster installation, you can always add nodes to your cluster—including GPUs. Once added, make these nodes available to platform users by configuring resource profiles.
Monitoring and managing resources: To help you manage your organization’s cluster resources more efficiently, Anaconda Enterprise enables you to monitor which sessions and deployments are running on specific nodes or by specific users. You can also monitor cluster resource usage in terms of CPU, memory, disk space, network, and GPU utilization.
Troubleshooting and debugging: To help you gain insights into user services and troubleshoot issues, Anaconda Enterprise provides detailed logs and debugging information related to the Kubernetes services it uses, as well as all activity performed by users. See fault tolerance in Anaconda Enterprise for information about what to do if a master node fails.