Mirroring channels and packages#

Workbench enables you to create a local copy of a repository so users can access the packages from a centralized, on-premises location.

The mirror can be complete, partial, or include specific packages or types of packages. You can also create a mirror in an air gapped environment to help improve performance and security.

Note

It can take hours to mirror the full repository.

Before you can use Workbench’s convenient syncing tools to configure local mirrors for channels and packages, you’ll need to configure access to the source of the packages to be mirrored, whether an online repository or a tarball (if an airgapped installation).

Prerequisites:

Types of mirroring:

Configuration options:

Log in to Workbench as an existing user using the following command:

$ anaconda-enterprise-cli login
Username: anaconda-enterprise
Password:
Logged anaconda-enterprise in!

Note

If Workbench is installed in a proxied environment, see Mirroring in a proxied environment for information on setting the NO_PROXY variable.


Mirroring the Anaconda repository#

Anaconda recommends the following process as a best practice for mirroring the Workbench Repository.

  1. Instead of using the default anaconda.yaml file included in the mirror tool installation, create two yaml files, one for mirroring the main channel, and another for mirroring the free channel.

    Example main.yaml file:

    dest_channel: main
    channels:
      - https://repo.anaconda.com/pkgs/main
    platforms:
      - linux-64
      - noarch
    

    Example free.yaml file:

    dest_channel: free
    channels:
      - https://repo.anaconda.com/pkgs/free
    platforms:
      - linux-64
      - noarch
    
  2. If you saved both of these files to the home directory, you can use the following commands to mirror these channels. Otherwise, amend the path so that it corresponds to where you saved the files:

    cas-sync-api-v5 --file ~/main.yaml
    cas-sync-api-v5 --file ~/free.yaml
    

    This mirrors all of the packages from these channels in the Anaconda repository. If the channel doesn’t already exist, it will be automatically created and shared with all authenticated users. You can customize the permissions on the mirrored packages by sharing the channel.

    Tip

    If you plan to mirror these channels on a regular basis, consider adding the -c flag to get a clean mirror each time. This will automatically remove any packages that have been removed from the Anaconda repository between mirrors from your internal repository—excluding any packages your organization has blocklisted.

  3. Verify that the mirror was successful by logging into your account and navigating to the Packages tab. You should see a list of the mirrored packages.


Mirroring a PyPI repository#

The full PyPI mirror size is currently close to 4TB, so ensure that your file storage location has sufficient disk space before proceeding. Rather than mirror the entire PyPI repository, you can use a configuration file such as $PREFIX/etc/anaconda-platform/mirrors/pypi.yaml to customize the mirror behavior and specify the subset of packages you want to mirror.

To create a PyPI mirror:

anaconda-enterprise-cli mirror pypi --config pypi.yaml

This command loads the packages on https://pypi.org into the pypi user account. Mirrored packages can be viewed at <https://anaconda.example.com>/repository/pypi/pypi/simple/, replacing <https://anaconda.example.com> with the actual URL to your installation of Workbench. (The second pypi in the url should match the user configuration value described below.)

The following configuration options are available for you to customize your configuration file:

Name

Description

user

The local user under which the PyPI packages are imported. Default: pypi.

pkg_list

A list of packages to mirror. Only packages listed are mirrored. If this is set, blocklist and allowlist settings are ignored. Default: [].

allowlist

A list of packages to mirror. Only packages listed are mirrored. If the list is empty, all packages are checked. Default: [].

blocklist

A list of packages to skip. The packages listed are ignored. Default: [].

latest_only

Only download the latest versions of the packages. Default: false.

remote_url

The URL of the PyPI mirror. /pypi is appended to build the XML RPC API URL, /simple for the simple index and /pypi/{package}/{version}/json for the JSON API. Default: https://pypi.python.org/.

xml_rpc_api_url

A custom value for XML RPC URL. If this value is present, it takes precedence over the URL built using remote_url. Default: null.

simple_index_url

A custom value for the simple index URL. If this value is present, it takes precedence over the URL built using remote_url. Default: null.

use_xml_rpc

Whether to use the XML RPC API as specified by PEP381. If this is set to true, the XML RPC API is used to determine which packages to check. Otherwise the scripts falls back to the simple index. If the XML RPC fails, the simple index is used. Default: true.

use_serial

Whether to use the serial number provided by the XML RPC API. Only packages updated since the last serial saved are checked. If this is set to false, all PyPI packages are checked for updates. Default: true.

create_org

Create the mirror user as an organization instead of a regular user account. All superusers are added to the “Owners” group of the organization. Default: false.

Note that all mirrored PyPI-like channels are publicly available to pull packages from both inside and outside the cluster (i.e. no auth token required).

EXAMPLE:

allowlist:
  - requests
  - six
  - numpy
  - simplejson
latest_only: true
remote_url: https://pypi.org/
use_xml_rpc: true

Configuring pip#

To configure pip to use this new mirror, create pip.conf as follows:

[global]
index-url=<https://anaconda.example.com>/repository/pypi/pypi/simple/

replacing <https://anaconda.example.com> with the actual URL to your Workbench.

To configure Workbench sessions and deployments to automatically use the pip.conf, run the following command.

anaconda-enterprise-cli spark-config --config /etc/pip.conf pip.conf

Alternately, if you can use the --index-url flag directly when invoking pip. For example,

pip install --index-url <https://anaconda.example.com>/repository/pypi/pypi/simple/ <package_name>

replacing <https://anaconda.example.com> with the actual URL to your Workbench installation, and <package_name> with the name of a package that is in your local mirror. In the example URL, the second pypi should match the user configuration value described above.

For more specific information on configuring pip, refer to the official documentation at https://pip.pypa.io/en/stable/user_guide/#config-file.


Mirroring specific packages#

Alternately, you may not wish to mirror all packages. In this case, you can specify which platforms or specific packages you want to mirror —or— use the allowlist, blocklist or license_blocklist functionality to control which packages are mirrored, by editing the provided mirror files. You cannot combine these methods. For more information, see Mirror configuration options.

cas-sync-api-v5 --file ~/my-custom-anaconda.yaml

Mirroring R packages#

An example configuration file for mirroring R packages is also provided:

# This is destination channel of mirrored packages on your local repository.
dest_channel: r

# conda packages from these channels are mirrored to dest_channel on your local repository.
channels:
  - https://repo.anaconda.com/pkgs/r/

# if doing a mirror from an airgap tarball, the channels should point to the tarball:
# channels:
#   - file:///path-to-expanded-tarball/repo-mirrors-<date>/r/pkgs/

# Only conda packages of these platforms are mirrored.
# Omitting this will mirror packages for all platforms available on specified channels.
# If the repository will only be used to install packages on the v5 system, it only needs linux-64 packages.
platforms:
  - linux-64
cas-sync-api-v5 --file ~/cas-mirror/etc/anaconda-platform/mirrors/r.yaml

Mirroring in an air-gapped environment#

To mirror the repository in a system with no internet access, create a local copy of the repository by extracting the airgapped tarball and point cas-sync-api-v5 to the extracted tarball.

In this example we will extract to /tmp:

# Replace ``<path to>`` with the actual path to the mirror file.
cd /tmp
tar xvf <path to>/mirror.tar

Now you have a local file-system repository located at /tmp/mirror/pkgs. You can mirror this repository by editing <path to cas-mirror>/etc/anaconda-platform/mirrors/anaconda.yaml to contain:

channels:
  - /tmp/mirror/pkgs

And then run the command:

cas-sync-api-v5 --file etc/anaconda-platform/mirrors/conda.yaml

This mirrors the contents of the local file-system repository to your Workbench installation under the username anaconda.


Configuring Workbench#

After creating the mirror, edit your Workbench configuration to add this new mirrored channel to the default Workbench channels and make the packages available to users.

conda:
  channels:
  - defaults
  default_channels:
  - main
  - free
  - r
  channel_alias: https://<anaconda.example.com>/repository/conda

Replacing <anaconda.example.com> with the actual URL to your installation of Workbench.

Note

The ap-workspace pod must be restarted for the configuration change to take effect on new project editor sessions.

To update the Workbench server with your changes, you’ll need to do the following:

  1. Run the following command in an interactive shell to identify the pod associated with the workspace services:

    kubectl get pods
    
  2. Restart the workspace services by running the following command:

    kubectl delete pod anaconda-enterprise-ap-workspace-<pod ID>
    

Sharing channels#

To make your new channels visible to your users in their Channels list, you need to share the channels with them.

EXAMPLE: To share new channels main, free, and r with group everyone for read access:

anaconda-enterprise-cli channels share --group everyone --level r main
anaconda-enterprise-cli channels share --group everyone --level r free
anaconda-enterprise-cli channels share --group everyone --level r r

After running the share command, verify by logging onto the user interface and viewing the Channels list.

For more information, see Sharing channels and packages


Mirror configuration options#

You can use the following options to configure your mirror:

remote_url

Specifies the remote URL from which the conda packages and the Anaconda and Miniconda installers are downloaded. The default value is: https://repo.continuum.io/.

channels

Specifies the remote channels from which conda packages are downloaded. The default is a list of the channels <remote_url>/pkgs/free/ and <remote_url>/pkgs/pro/

All specification information should be included in the same file, and can be passed to the cas-sync-api-v5 command via the --file argument:

cas-sync-api-v5 --file ~/cas-mirror/etc/anaconda-platform/mirrors/anaconda.yaml

destination channel

The configuration option dest_channel specifies where files will be uploaded. The default value is: anaconda.


SSL verification#

The mirroring tool uses two different settings for configuring SSL verification. When the mirroring tool connects to its destination, it uses the ssl_verify setting from anaconda-enterprise-cli to determine how to validate certificates. For example, to use a custom certificate authority:

anaconda-enterprise-cli config set sites.master.ssl_verify /etc/ssl/certs/ca-certificates.crt

The mirroring tool uses conda’s configuration to determine how to validate certificates when connecting to the source that it is pulling packages from. For example, to disable certificate validation when connecting to the source:

conda config --set ssl_verify false

Mirroring in a proxied environment#

If Workbench is installed in a proxied environment, set the NO_PROXY variable. This ensures the mirroring tool does not use the proxy when communicating with the repository service, and prevents errors such as Max retries exceeded, Cannot connect to proxy, and Tunnel connection failed: 503 Service Unavailable.

export NO_PROXY=<master-node-domain-name>

Platform-specific mirroring#

By default, the cas-sync-api-v5 tool mirrors all platforms. If you do not need all platforms, edit the YAML file to specify the platform(s) you want mirrored:

platforms:
  - linux-64
  - osx-64
  - win-64

Note

The platform argument is evaluated before any other argument.


Package-specific mirroring#

In some cases you may want to mirror only a small subset of the repository. Rather than blocklisting a long list of packages you do not want mirrored, you can instead simply enumerate the list of packages you DO want mirrored.

Note

This argument cannot be used with the blocklist, allowlist or license_blocklist arguments—it can only be combined with platform-specific and version-specific mirroring.

EXAMPLE:

pkg_list:
  - accelerate
  - pyqt
  - zope

This example mirrors only the three packages: Accelerate, PyQt & Zope. All other packages will be completely ignored.


Python version-specific mirroring#

Mirror the repository with a Python version or versions specified.

EXAMPLE:

python_versions:
  - 3.3

Mirrors only Anaconda packages built for Python 3.3.


License blocklist mirroring#

The mirroring script supports license blocklisting for the following license families:

AGPL
GPL2
GPL3
LGPL
BSD
MIT
Apache
PSF
Public-Domain
Proprietary
Other

EXAMPLE:

license_blocklist:
  - GPL2
  - GPL3
  - BSD

This example mirrors all the packages in the repository EXCEPT those that are GPL2-, GPL3-, or BSD-licensed, because those three licenses have been blocklisted.


Blocklist mirroring#

The blocklist allows access to all packages EXCEPT those explicitly listed. If the license_blocklist and blocklist arguments are combined, license_blocklist is evaluated first, and blocklist is a supplemental modifier.

EXAMPLE:

blocklist:
  - bzip2
  - tk
  - openssl

This example mirrors the entire repository EXCEPT the bzip2, Tk, and OpenSSL packages.


Allowlist mirroring#

The allowlist argument adds or includes packages that would be otherwise excluded by the blocklist and/or license_blocklist functions.

EXAMPLE:

license_blocklist:
  - GPL2
  - GPL3
allowlist:
  - readline

This example mirrors the entire repository EXCEPT any GPL2- or GPL3-licenses packages, but includes readline, despite the fact that it is GPL3-licensed.


Combining multiple mirror configurations#

You may find that combining two or more of the arguments above is the easiest way to get the exact combination of packages that you want.

Note

The platform argument is evaluated before any other argument.

EXAMPLE: This example mirrors only Linux-64 distributions of the dnspython, Shapely and GDAL packages:

platforms:
  - linux-64
pkg_list:
  - dnspython
  - shapely
  - gdal

If the license_blocklist and blocklist arguments are combined, license_blocklist is evaluated first, and blocklist is a supplemental modifier.

EXAMPLE: In this example, the mirror configuration does not mirror GPL2-licensed packages. It does not mirror the GPL3 licensed package pyqt because it has been blocklisted. It does mirror all other packages in the repository:

license_blocklist:
  - GPL2
blocklist:
  - pyqt

If the blocklist and allowlist arguments are both employed, the blocklist is evaluated first, with the allowlist functioning as a modifier.

EXAMPLE: This example mirrors all packages in the repository except astropy and pygments. Despite being listed on the blocklist, accelerate is mirrored because it is listed on the allowlist.

blocklist:
  - accelerate
  - astropy
  - pygments
allowlist:
  - accelerate