Mirroring channels and packages#
Data Science & AI Workbench enables you to create local copies of a repositories so users can access packages from a centralized, on-premise location. This process is called mirroring. You can mirror the full content of a repository, or include only specific packages or types of packages from the repository in your mirror. You can also create mirrors in an air-gapped network improve performance and security.
You can mirror an online repository, or you can use a tarball containing package data to populate a channel in Workbench.
- Prerequisites:
Follow the steps for administration server setup on your current machine to install the necessary tools for mirroring.
Note
It can take several hours to mirror an entire repository, depending on its size.
Creating a conda mirror#
The basic steps for creating a conda mirror are:
If necessary, create a channel in the internal Workbench repository.
Initiate the mirror by running the following command:
anaconda-mirror-ae5 --file /path/to/<mirror.yaml>
Note
Append
--dry-run
to the command to see what actions would be taken by the mirror, without performing actual modifications.
Preparing your mirror configuration file#
Create a <mirror>.yaml
file that details the configurations for the mirror.
Tip
You can name this file whatever you’d like. Anaconda recommends naming it the same as the channel you are mirroring to.
- Basic configurations:
Define source channel locations, package platforms, and destination/storage location details. Manage package formats, clean up outdated packages, and test configurations without applying changes by including these configurations.
Parameter |
Description |
---|---|
|
List of URLs for channels you want to mirror from. If a short channel name is supplied, Workbench uses its system-level |
|
List of platforms you want to mirror packages for. For example, If no value is supplied, the mirror will include packages for all platforms available on the source channel. |
|
The short name for the internal Workbench repository channel you are mirroring to. The rest of the channel URL is automatically completed by |
|
The web address or path where you want to store your mirrored packages. The specific formatting and necessity of this value depends on the type of destination repository. For more information, see repository-specific configurations. |
|
Determines how the mirror manages
Defaults to If your repository does not support |
|
Default: |
|
Default: |
- Filtering configurations:
Fine-tune which packages are included in the mirror. Specify versions of Python or R packages that your packages should be compatible with, include only specific packages, or exclude packages by name and license family type.
Parameter |
Description |
---|---|
|
A comma-separated list of Python versions. Restricts all Python packages and packages that depend on Python to these versions. |
|
A comma-separated list of R versions. Restricts all R packages and packages that depend on R to these versions. |
|
List of package names or valid MatchSpec strings. If supplied, only the specified packages will be mirrored, not their dependencies. Cannot be paired with |
|
List of license families to exclude from the mirror. To see a list of valid license families, use the Cannot be paired with |
|
List of package names or valid MatchSpec strings to exclude. Cannot be paired with |
|
List of package names or valid MatchSpec strings to override the mirror’s other filters and include these packages even if they would otherwise be filtered out. Cannot be paired with |
Note
For more information about MatchSpec, see package match specifications.
- Advanced configurations:
Configure repository authentication, enforce platform restrictions, and manage SSL verification for secure connections.
Parameter |
Description |
---|---|
|
Supplies credentials for repository authentication. For more information, see repository-specific configurations. |
|
Default: |
|
Number of retry attempts for failed connections. Default: |
|
Number of failed transactions before stopping. Default: |
|
Default: |
Note
If Workbench is installed in a proxied environment, see Configuring conda in Workbench for information on setting the NO_PROXY
variable.
Repository-specific configurations#
JFrog Artifactory#
For Artifactory destinations, the dest_site
can be a repository hostname, or a full URL.
If you supply the hostname only, anaconda-mirror
interprets the channel path as:
https://<dest_site>/artifactory/<dest_channel>
If you supply a URL, anaconda-mirror
appends /artifactory/dest_channel
to it to complete the channel path. For example, if you set the dest_site
to https://example.site.com
, the full path is interpreted as this:
https://example.site.com/artifactory/<dest_channel>
If your mirror is not at this location, further pathing can be included in the URL to reach the correct location. For example, if you set dest_site
as https://example.site.com/artifactory/repo/subpath/
, it will interpret the path literally and append the dest_channel
value to the end:
https://example.site.com/artifactory/repo/subpath/<dest_channel>
- To authenticate to a JFrog Artifactory repository:
Configure the
username
andpassword
values in your.yaml
file to contain your credentials. If both values are supplied, they are delivered using basic HTTP authentication. You can substitute an access token for your password if necessary.Configure just the
password
value in your.yaml
file. This is delivered as a bearer token using theAuthorization: Bearer
header. This must be an access token.Configure your
.netrc
file to store yourusername
andpassword
for the repository. These values are delivered using basic HTTP authentication.
S3 bucket#
For Simple Storage Service (S3) buckets, the channel path is a concatenation of the dest_site
and dest_channel
values.
For example, if you were mirroring to an S3 bucket, your dest_site
would be set to <bucket_name>/full/path/to/
and the full channel path is interpreted as:
<bucket_name>/full/path/to/<dest_channel>
Authentication to an S3 source is currently controlled entirely by the environment. For example, you can use the aws
CLI tool to configure the target region and authenticate. You may wish to use the AWS_PROFILE
environment variable to select among multiple configurations.
Local#
Much like the S3 bucket, the local repository channel path consists of a concatenation of the dest_site
and dest_channel
values.
No authentication is necessary for local repositories.
anaconda-enterprise-cli#
The dest_site
value defaults to the <SITE_NAME>
value established when you configure the workbench CLI. If you have only configured the CLI to be able to access one site (i.e. your Workbench instance), there is no need to specify this value.
Authentication is handled when you log in to the CLI.
Example conda and R mirrors#
Here are some example mirror .yaml
files you can use to mirror some common repositories:
Anaconda’s main channel (full)
dest_channel: main channels: - https://repo.anaconda.com/pkgs/main platforms: - linux-64
Anaconda’s R channel (full)
dest_channel: r channels: - https://repo.anaconda.com/pkgs/r platforms: - linux-64
Air-gapped network mirror
dest_channel: anaconda channels: - /file/path/to/unpacked/repository/packages platforms: - linux-64
Mirroring a PyPI repository#
The full PyPI mirror size is currently close to 10TB, so ensure that your file storage location has sufficient disk space before proceeding.
Because anaconda-mirror
does not handle .pip
package formatting, mirrors for PyPI repositories containing such packages are managed by the anaconda-enterprise-cli
tool.
The steps are identical to creating a conda mirror:
If necessary, create a channel in the internal Workbench repository.
Initiate the mirror by running the following command:
anaconda-enterprise-cli mirror pypi --config pypi-mirror.yaml
This command loads the packages on https://pypi.org
into the user’s account.
Mirrored packages can be viewed at https://<FQDN>/repository/pypi/pypi/simple/
, replacing <FQDN>
with the fully qualified domain name of your installation of Workbench. (The second pypi
in the url should match the user
configuration value described below.)
- PyPI configurations:
PyPI mirror
.yaml
configuration values consist of the following:
Parameter |
Description |
---|---|
|
The local user under which the PyPI packages are imported. Default: |
|
List of package names to mirror. If supplied, only the specified packages will be mirrored, not their dependencies. Cannot be paired with |
|
List of package names to mirror. If supplied, only the specified packages will be mirrored, not their dependencies. Cannot be paired with |
|
List of package names to skip. Packages listed here are not mirrored. Cannot be paired with |
|
If supplied, only the latest package versions are mirrored. Default: |
|
The URL of the PyPI mirror.
Default: |
|
A custom value for XML RPC URL. If this value is present, it takes precedence over the URL built using remote_url. Default: null. |
|
A custom value for the simple index URL. If this value is present, it takes precedence over the URL built using remote_url. Default: null. |
|
Whether to use the XML RPC API as specified by PEP381. If this is set to Default: |
|
If set to Default: |
|
Creates the mirror user as an organization instead of a regular user account. All superusers are added to the Owners group of the organization. Default: |
Note
All mirrored PyPI-like channels are publicly available to pull packages from both inside and outside Workbench (no authentication is required).
Example PyPI mirror (partial)
allowlist: - requests - six - numpy - simplejson latest_only: true remote_url: https://pypi.org/ use_xml_rpc: true
Configuring pip#
To configure pip to use this new mirror, create pip.conf
as follows:
# Replace <WORKBENCH_URL> with the actual URL to your Workbench instance
[global]
index-url=<WORKBENCH_URL>/repository/pypi/pypi/simple/
To configure Workbench sessions and deployments to automatically use the pip.conf
, run the following command.
anaconda-enterprise-cli spark-config --config /etc/pip.conf pip.conf
For more specific information on configuring pip, see the official pip documentation.