Adding MLflow to Anaconda Enterprise 5#

You can install MLflow as an optional component of Anaconda Enterprise.

Prerequisites#

You must have managed persistence enabled.

Set environment variables#

Setting environment variables allows MLflow to be accessible from all sessions, deployments, and schedules. This also sets the deployment-wide values for the MLflow tracking server endpoint.

  1. Connect to your instance of Anaconda Enterprise. Get help from your IT administrator with this step if necessary.

  2. View a list of your configmaps by running the following command:

    kubectl get cm
    
  3. Edit the anaconda-enterprise-env-var-config.yml file.

    kubectl edit cm anaconda-enterprise-env-var-config
    
  4. Include the following lines:

    # Replace <AE5_FQDN> with your Anaconda Enterprise fully qualified domain name
    MLFLOW_DISABLE_ENV_MANAGER_CONDA_WARNING: "TRUE"
    MLFLOW_TRACKING_URI: "https://mlflow.<AE5_FQDN>"
    MLFLOW_REGISTRY_URI: "https://mlflow.<AE5_FQDN>"
    

    Note

    If your ENV_VAR_PLACEHOLDER: foo entry still exists, replace it now.

    Here is an example of what your file might look like when complete:

    ../../_images/ae5_mlflow_env_var_config_file.png
  5. Save your work and close the file.

  6. Update Anaconda Enterprise with your changes and restart services by running the following command:

    kubectl get pods | grep 'ap-deploy\|ap-workspace' | cut -d' ' -f1 | xargs kubectl delete pods
    

Download MLflow#

  1. Download the latest MLflow release.

  2. Extract all files from the tarball you just downloaded.

Tip

Keep these somewhere that is easy for you to locate.

Install MLflow#

  1. Open a browser and navigate to Anaconda Enterprise.

  2. Log in as an administrator account with managed persistence permissions.

  3. Click Create +, then select Upload Project from the dropdown menu.

  4. Click Browse.

  5. Locate the extracted files from your download and select the MLflowTracking ServerProject-<VERSION>.tar.gz file.

  6. Provide a name for your MLflow Server project.

  7. Click Upload.

    ../../_images/ae5_mlflow_project_upload.png
  8. Open a session for your new MLflow Server project.

  9. Upload the migrate-<VERSION>.py and anaconda-platform-mlflow-runtime-<VERSION>.tar.gz files to the root of the project.

    Caution

    Do not commit changes to the project until instructed. If you attempt to commit too early, you will receive an error due to the size of the runtime file.

  10. Open a terminal in your session, and run the following command:

    # Replace <VERSION> with your release version
    python migrate-<VERSION>.py
    
  11. Activate your new environment by running the following command:

    conda activate /tools/mlflow/mlflow_env/
    
  12. Verify your installation was successful by running the following command:

    mlflow
    

    Tip

    If your install was successful, your command returns a list of available mlflow arguments.

  13. Open the anaconda-project.yml file in the project with your preferred file editor.

  14. Update the MLFLOW_TRACKING_GC_TTL value to something that makes sense for your use case.

    ../../_images/ae5_mlflow_set_gc_ttl.png

    Note

    The MLFLOW_TRACKING_GC_TTL variable instructs MLflow to perform garbage collection on deleted artifacts that have reached a specified age.

  15. Commit the changes you’ve made to the project.

    Note

    It is not necessary to commit the migrate-<VERSION>.py file to the project. Once installation is complete, you can safely delete this file.

  16. Stop the project session.

Deploy MLflow#

  1. Select Deployments from the left-hand navigation.

  2. Click Deploy.

  3. Enter a name for the deployment, set the static url to the same value used in your anaconda-enterprise-env-var-config.yml file (https://mlflow.<AE5_FQDN>), and then click Deploy.

    ../../_images/ae5_mlflow_deployment.png

Provide Access#

  1. Select the deployment you just created.

  2. Select Share from the left-hand navigation menu.

  3. Enter the names of users or groups to provide with permissions to access MLflow, then click Add.

    ../../_images/ae5_mlflow_add_collaborators.png

    Note

    This list populates from Keycloak.

  4. Select Settings from the left-hand menu.

  5. Click Generate to create a token for this deployment.

    ../../_images/ae5_mlflow_generate_token.png

    Note

    Every user who needs API access to MLflow also requires this token. You must share this token manually.

    The administrator of the MLFlow Tracking Server must generate and provide the access token each time the server is restarted.

Set up garbage collection#

When a client deletes a resource, MLflow transitions the resource into a deleted lifecycle state and hides it in the UI, but does not purge it from the system. Deleted resources will block creation of new resources with the same name until the garbage collection process has purged it.

The garbage collection process works in tandem with the MLFLOW_TRACKING_GC_TTL variable that is set in the anaconda-platform.yml project file. When a resource reaches the age specified by the MLFLOW_TRACKING_GC_TTL variable AND the garbage collection process runs, it will be purged.

  1. Create a schedule within the MLflow Server project.

  2. Name the schedule MLflow Garbage Collection.

  3. Open the Deployment Command dropdown menu and select gc.

  4. Schedule an interval to run the garbage collection. Custom schedules utilize cron expressions.

  5. Click Schedule.

    ../../_images/ae5_mlflow_schedule_gc.png

Upgrading MLflow#

  1. Download the latest MLflow release.

  2. Open a browser and navigate to Anaconda Enterprise.

  3. Log in as an administrator account with managed persistence permissions.

  4. Open your MLflow Server project.

  5. Select Deployments from the left-hand navigation.

  6. Terminate your current deployment.

  7. Select Schedules from the left-hand navigation.

  8. Pause all shceduled runs.

  9. Start a new session in your MLflow Server project.

  10. Upload your newly obtained migrate-<VERSION>.py and anaconda-platform-mlflow-runtime-<VERSION>.tar.gz files to the root of the project.

  11. Open a terminal in your project and run the following command:

    # Replace <VERSION> with your release version
    python migrate-<VERSION>.py
    
  12. Redeploy your MLflow Server project.

  13. Generate a new token to share your deployment.

    Note

    Remember, you must generate an access token and provide it to users each time the server is restarted!

  14. Restart all scheduled runs.