Project reproducibility in Workbench#

Why do I need to worry about project reproducibility?#

Project reproducibility ensures that your work retains its integrity and utility over time and across different environments. This guarantees that your code will consistently produce the same results, or continue to operate as intended, regardless of where it is run. If you do not ensure that your project is reproducible, you risk having code that could suddenly stop functioning in the future due to changes in the project’s software environment.

How can I achieve project reproducibility?#

While software tools can aid in project reproducibility, the only way to ensure complete project reproducibility is by encouraging users to adhere to the best practices described here at an organizational level.

Using anaconda-project for basic reproducibility:#

The most reliable way to ensure your project remains reproducible indefinitely is to utilize the anaconda-project lock command to create a fully specified environment; one that has all packages used in the project and their dependencies locked to a specific version. This ensures that your project will be reproduced exactly as it was initially configured, because there will never be an unexpected update or change if new package dependencies are released. For more information about the anaconda-project-lock.yml file, see the official Anaconda Project documentation.

Project locking is not automatically enforced because it is a time-consuming process and isn’t necessary when running development instances of the deployment. Therefore, it is up to the user to perform the locking procedure.

Tip

Run the anaconda-project lock command when you have finalized work on a project, or if you are ready to move a specific version of your project to a production environment (i.e. you are ready for your project to be interacted with or used publicly).

Managing environments for reproducibility#

While locking your project is essential, it is ultimately the responsibility of the user to create and commit the lock file to the project. Furthermore, anaconda-project will run a “solve” (i.e. a resolution of dependencies) every time the project deploys, which can introduce potential issues for project reproducibility if the conda solver version has changed.

With Workbench versions 5.5 and newer, administrators can create and distribute pre-solved “persistent environments” to their users. These environments are like preconfigured workspaces set up by an administrator that have all the necessary tools and software already in place so their team can work without needing to make changes to the environment.

Because these environments are pre-solved and fixed, they don’t require a new solve with each deployment. This means deployments from these environments are created more quickly, and guarantee that your project will run as expected every time, regardless of any external changes such as updates to the conda resolver itself.

For instructions on configuring persistent environments and supplying them to your users, see Configuring persistent environments and sample projects.

Best practices for managing environments for reproducibility:#

Create comprehensive environments - Ensure that every managed environment is set up with all necessary libraries and packages that users might need for their work, including proprietary or internal packages unique to your organization. This will allow users to perform their tasks without needing to modify the environment.

Avoid unnecessary modifications - Encourage users to avoid independently modifying the anaconda-project.yml file of persistent environments that are provided by administrators. This practice helps maintain the integrity and reproducibility of the environment.

Promote project locking if modifications are required - If users must make changes, they should use the anaconda-project lock command to lock the environment afterwards, ensuring it can be precisely recreated later. They must also inform the administrator of these changes, so future environments can be adjusted to incorporate these needs without further modifications. For step-by-step instructions on locking your project, see Locking project configurations.

Tip

Run the anaconda-project lock command when you move to production, even if you created your project using a persistent environment with no configuration modifications!

Continuous updates - Administrators should routinely create new persistent environments to incorporate updates to packages and their dependencies. This keeps the environments up to date with the latest features and security patches.

Naming conventions - It’s advisable to name these environments systematically, including the date of creation, which helps in identifying and managing them. For example, you could adopt the following structure: <GROUP>_<YYYY><MM> (where <GROUP> denotes the intended users for the environment and <YYYY><MM> is the year and month).

Deprecate old environments safely - Before an administrator deprecates an old environment, it’s critical to ensure that it is not in use. Removing an environment that is still in use could disrupt ongoing projects.