Managing cluster resources¶
After you’ve installed an Anaconda Enterprise cluster, you’ll want to continue to manage and monitor the cluster to ensure that it scales with your organization as needs change. These on-going management and monitoring tasks include the following:
- When you’ve outgrown your initial Anaconda Enterprise cluster installation, you can easily add new nodes—including GPUs. To make these nodes available to platform users, you’ll configure resource profiles.
- To help you manage your organization’s cluster resources more efficiently, Anaconda Enterprise enables you to track which sessions and deployments are running on specific nodes or by specific users. You can also monitor cluster resource usage in terms of CPU, memory, disk space, network and GPU utilization.
- To help you gain insights into user services and troubleshoot issues, Anaconda Enterprise provides detailed logs and debugging information related to the Kubernetes services it uses, as well as all activity performed by users. See fault tolerance in Anaconda Enterprise for information about what to do if a master node fails.