Backing up and restoring Workbench#

Backing up Data Science & AI Workbench protects your data in case of accidents (deletion of important data) or technical issues (failed hard drive). You can back up at any time, but refer to your company’s Disaster Recovery policy for best practices.

Warning

Do not attempt to restore backup files created from a different version of Workbench. To upgrade your version of Workbench, reference Upgrading between versions of Workbench.

Caution

Anaconda recommends the use of managed persistence to ensure open sessions and deployments are captured by the backup process.

If you are not using managed persistence, have all users save their work, stop any open sessions and deployments, and log out of the platform during the backup process.

Note

The backup/restore script supports synchronizing your production cluster to a “hot” backup cluster at periodic intervals. This is commonly used for Disaster Recovery. To learn more about this process, please speak with our Integration Team.

Prerequisites#

You have sudo access.
The jq conda package is installed in your base environment.

or

Optionally, you can install this ae5-conda environment, which already contains the necessary packages.

Install the backup script#

Note

The ae5-conda environment mentioned in the prerequisites already contains the backup script. If you choose to install the environment, skip ahead to verify your installation.

Standard environment backup script#

Install the ae5_backup_restore package into your base conda environment:

conda install -c ae5-admin ae5_backup_restore

Once complete, verify the installation.

Air-gapped environment backup script#

Download the latest ae5_conda installer file and move it to your master node.

Set the installer file to be executable, then run the script to install the ae5_conda environment:

chmod +x ae5-conda-latest-Linux-x86_64.sh
./ae5-conda-latest-Linux-x86_64.sh

Once complete, verify the installation.

Verify your installation#

Verify your installation by testing a basic package command. Let’s try the help command:

ae_backup.sh -h

If your terminal returns the usage help text, then your installation of the backup/restore script was successful! You are now ready to run the backup script.

Run the backup script#

Run the ae_backup.sh script to create backup files of your cluster in the current directory:

bash ae_backup.sh

Or specify a destination for your backup files:

bash ae_backup.sh /your/file/path/here

The backup script creates two tarball files:

ae5_config_db_YYYYMMDDHHMM.tar.gz
ae5_data_YYYMMDDHHMM.tar.gz

Note

YYYYMMDDHHMM is the format for the timestamp of your backup data.

The ae5_config_db file stores your Kubernetes resources and Postgres data. The ae5_data file stores your /opt/anaconda/storage data.

Note

The backup script does not back up the package repository.

Backup command line options#

-h, --help: Prints help and exits.

-d <DIR>, --ae-data <DIR>: Changes the location of the Workbench storage. The default location is /opt/anaconda/storage, and should not be changed when used on a Gravity cluster.

-b <DIR>, --backup-dir <DIR>: Changes the location where the backup files are saved. The default location is the current directory. Use when the space available in the current directory is insufficient to hold the backup.

-s, --skip-clean: If supplied, the script will not remove the intermediate files it generates during the backup process. This is useful for informational or debugging purposes.

-c, --config-db: If supplied, the script will not create a data tarball, only the config/postgres tarball. This is useful if the script is combined with an alternate mechanism for taking snapshots/backups of the data.

-r, --repository: If supplied, the script will include the full package repository in the data tarball. It doesn’t do this by default because the repository is likely to be large and incompressible.

Restore from backup data#

Warning

The restore script requires backup files to be created from the same output of the backup script. Do not attempt to load files that were created from different backups.

Run the restore script to restore your cluster from previously-created backup data:

bash ae_restore.sh ae5_config_db_YYYYMMDDHHMM.tar.gz ae5_data_YYYYMMDDHHMM.tar.gz

Restoration modes#

The restore script has three different modes for data restoration that can be used to customize how Workbench is restored.

Restoring to the original host#

In this mode, all resources are restored from backup, except for the base ingress specification.

This mode is used when a clean reinstall of an existing cluster has been performed and you wish to perform a full restoration from backup. User workload will be restored (deployments, sessions, jobs), except they will be placed in a paused state. The script provides instructions on how to unpause user workload once the administrator is satisfied that the restoration has completed successfully.

Restoring to a different host without a hostname change#

In this mode, only some resources are restored, as described below.

Restored data:

Kubernetes secrets (non-ssl)
User/Project Data
Postgres

Non-restored data:

Hostname
SSL certificates
Configmaps
Ingress
Kubernetes resources for user workload

This mode is used if you wish to restore the backup to a separate existing cluster for inspection. By preserving the cluster’s native configuration, the operation of the cluster is preserved but disconnected from the source.

Restoring to a different host, but with a hostname change#

This mode fully restores all resources, including the deployments and scheduled jobs. The ingress is also updated in this case to reflect the new hostname. This is used if you need to replace a faulty master node with a hot backup that was already running under a different hostname.

Restoration command line options#

-h, --help: Prints help and exits.

-d <DIR>, --ae-data <DIR>: Changes the location of the Workbench storage. The default location is /opt/anaconda/storage, and should not be changed when used on a Gravity cluster.

-b <DIR>, --backup-dir <DIR>: Changes the location where the backup files are found. The default location is the current directory. Use when the space available in the current directory is insufficient to hold the backup.

-s, --skip-clean: If supplied, the script will not remove the intermediate files it generates during the backup process. This is useful for informational or debugging purposes.

-u, --update-hostname: If supplied and necessary, the script modifies the local hostname to match the backup content. Otherwise, the script preserves the local hostname.

-c, --config-only: If supplied, the script only restores the configuration data (SSL, secrets, configmaps, etc.) It does not modify the Postgres database and the data.

-w, --wait: When the system pods are restarted, wait for them to stabilize for exiting the script.

-p, --pause: Leaves the cluster in a paused state upon completion of the restore process.

-y, --yes: Restore function will not ask for confirmation before proceeding. Should be used with care.

Bring your own Kubernetes#

Customer supplied Kubernetes clusters (non-gravity) can take advantage of this backup/restore script. However the backup/restore process will be slightly different.

When taking a backup, you will need to supply the -c, –config-db command line argument, as the backup script will only be able to capture your Workbench configuration data. This will not capture user/project data, and you will need to ensure you are taking regular backups of your provided storage solution. This includes the Persistent Volume used for both anaconda-storage and anaconda-persistence that were configured at time of install.

When restoring from a backup, you will need to supply the -c, –config-only command line option, as the restore script will only be able to restore your Workbench configuration data. This will not restore user/project data, and you will need to ensure you have also restored a backup of your provided storage solution.