Migrating from AE 4 to AE 5¶
The process of migrating from AE 4 to AE 5 involves the following tasks:
For Administrators:
Export all packages and package info. from your AE 4 Repository.
Import the packages into Anaconda Enterprise 5.
For Notebook users:
Export each project environment to a
.yml
file.
Due to architectural changes between versions of the platform, there are some additional steps you may need to follow to migrate code between AE4 and AE5. These steps vary, based your current and new platform configurations.
Exporting packages¶
Anaconda Enterprise enables you to create a site dump of all packages used by your organization, including the owners and permissions associated with each package.
Log in to the AE 4 Repo and switch to the
anaconda-server
user.To export your packages, run the following command on the server hosting your Anaconda Enterprise Repository:
Running this command creates a directory structure containing all files and user information from your Anaconda Enterprise Repository. For example:
Each subdirectory of site-dump
contains the contents of the Repository as it pertains to a particular user. For example anaconda-user-1
has two packages, moto
and pysocks
. The meta.json
file in each user directory contains information about any groups of end users that user belongs to, as well as their organizations.
Package directories contain the package files, prefixed with the id of the database. The meta.json
file in each package directory contains metadata about the packages, including version, build number, dependencies, and build requirements.
Note
Other files included in the site-dump—such as projects and environments—are NOT imported by the package import tool. That’s why users have to export their Notebook projects separately.
Importing packages¶
You can choose whether to import packages into Anaconda Enterprise 5 by username or organization, or import all packages.
Before you begin:
We recommend you compare the import options before proceeding, so you can choose the option that most closely aligns with the desired outcome for your organization.
You’ll be using the Anaconda Enterprise command line interface (CLI) to import the packages you exported, so be sure to install the AE CLI if you haven’t already.
Log into the command line interface using the following command:
Follow the instructions below for the method you want to use to import packages.
To import packages by username or organization:
As you saw in the example above, the packages for each user are put in a separate directory in the site-dump. This means that the import process is the same whether you specify a directory based on a username or organization.
Import a single directory from the site-dump
using the following command:
Replacing name
with the actual name of the directory you want to import.
Note
You can also pass a list of directories to import.
To import all packages:
Run the following command to import all packages in the site dump:
How channels of imported packages are named
When you import packages by username, a new channel is created for each unique label the user has applied to their packages, using the username as a prefix. (The default package label “main” is not included in channel names.)
For example, if anaconda-user-1
has the following packages:
moto-0.4.31-2.tar.bz2
with labelmain
pysocks-1.6.6-py35_0.tar.bz2
with labeltest
The following channels are created:
anaconda-user-1
containing the package filemoto-0.4.31-2.tar.bz2
anaconda-user-1/test
containing the package filepysocks-1.6.6-py35_0.tar.bz2
When you import all packages in an organization, a new channel is created for each organization, group, and label. The script appends any groups associated with the organization to the channel name it creates. (The default package label “main” and default organization label “Owner” are not included in channel names.)
For example, if anaconda-organization
includes a group called Devs
, and the site dump for anaconda-organization
contains a package file named xz-5.2.2-1.tar.bz2
with the label Test
, running the script will create the following channels:
anaconda-organization
– This channel contains all packages that the organization owner can access.anaconda-organization/Devs
– This channel contains all packages that theDev
group can access.anaconda-organization/Devs/Test
– This channel contains all packages labeledTest
that theDev
group can access.
Granting access to channels and packages
After everything is uploaded, each channel created as part of the import process is shared with the appropriate users and groups. In the case of the example above,``anaconda-user-1`` is granted read-write access to the anaconda-user-1
and anaconda-user-1/test
channels, and all members of the Devs
group will have read permission for everything in the Devs
channel.
You can change these access permissions as needed using the Anaconda Enterprise UI or CLI. See Managing channels and packages for more information.
Migrating AE 4 Notebook Projects¶
Before you begin:
If your project refers to channels in your on-premises repository or other channels in anaconda.org, ask you System Administrator to mirror those channels and make them available to you in AE 5.
If your project use non-conda packages, you’ll need to upload those packages to AE 5.
If your notebook refers to multiple kernels or environments, set the kernel to a single environment.
If your project contains several notebooks, verify that they all are using the same kernel or environment.
Exporting your project¶
Exporting a project creates a yml file that includes all the environment information for the project.
Log in to your Anaconda Enterprise Notebooks server.
Open a terminal window and activate conda environment 2.6 for your project.
Install
anaconda project
in the environment:If you get a
not found
message, install it from anaconda.org:Export your environment to a file:
<default>
is the name of the environment where the notebook runs.Verify that the format of the environment file looks similar to the following, and that the dependencies for each notebook in the project are listed:
If it contains any warning messages, run this script to modify the encoding and remove the warnings:
Converting your project¶
To create a project that’s compatible with Anconda Enterprise 5, perform these steps:
Run the following command from an interactive shell:
AE 4 supports Linux only, so run the following command to remove the Windows and MacOS platforms from the project’s
anaconda-project.yml
configuration file:Run the following command to verify the platforms were removed:
Add
/.indexer.pid
and.git
to the.projectignore
file.Run the following command to compress your project:
Note
There is a 1GB file size limit for project files, and project names cannot contain spaces or special characters.
In Anaconda Enterprise Notebooks, from your project home page, open the Workbench. Locate your project file (e.g.,
AENProject.tar.gz
in the image below) in the file list, right-click and select Download.
Now your project is ready to be uploaded into Anaconda Enterprise 5.
Uploading your project to AE 5¶
Log in to the Enterprise v5 interface and upload your project file FILENAME.tar.gz
. See Working with projects for help.
Note
To maintain performance, there is a 1GB file size limit for project files you upload. Anaconda Enterprise projects are based on Git, so we recommend you commit only text-based files relevant to a project, and keep them under 100MB. Binary files are difficult for version control systems to manage, so we recommend using storage solutions designed for that type of data, and connecting to those data sources from within your Anaconda Enterprise sessions.
Migrating code¶
AE4 and AE5 are based on a different architecture, therefore some code inside your AE4 notebooks might not run as expected in AE5. AE4 sessions ran directly on the host filesystem, where the libraries, drivers, packages, and connectors required to run them were available. AE5 sessions run in isolated containers with their own independent file system, so they don’t necessarily have access to everything on the host.
This difference in architecture primarily impacts the following:
Connecting to external data sources¶
If you currently rely on ODBC/JDBC drivers to connect to specific databases such as Oracle and Impala, we recommend you use services that support this, such as Apache Impala and Apache Hive, instead. Additionally, using a language and platform agnostic connector such as Thrift allows you to create reproducible code that is more portable.
For best practices on how to connect to different external systems inside AE5, see Connecting to the Hadoop and Spark ecosystem.
Service/System |
Recommended |
---|---|
Apache Impala |
|
Apache Hive |
|
Oracle |
build conda package with their driver |
If this is not possible, we recommended you obtain or build conda packages for the connectors and drivers you need. This enables you to add them as package dependencies for your project that will be installed when you start a Notebook session or deploy the project.
This has the added benefit of enabling you to update dependencies on connectors on a per-project basis.
Installing external dependencies¶
If you typically install dependencies using system package managers such as apt
and yum
, you can continue to do so in Anaconda Enterprise 5. Dependencies installed from the command line are available during the current session only, however.
If you want them to persist across project sessions and deployments, add them as packages in the project’s anaconda-project.yml
configuration file. See Configuring project settings for more information.
If your project depends on package that is not available in your internal Anaconda Enterprise Repository, search anaconda.org or build your own conda package using conda-build then upload the conda package to the AE5 repository.
If you don’t have the expertise required to build the custom packages your organization needs, consider engaging our consulting team to make your mission-critical analytics libraries available as conda packages.