NFS Storage Recommendations#

A common mechanism for provisioning the storage required for Data Science & AI Workbench persistence involves the use of a Network File Server (NFS). This includes many cloud offerings such as Amazon EFS and Google Filestore and many on-premise NAS/SAN implementations. In this section, we provide specific recommendations for server- and client-side configuration. These recommendations should be used to augment the general storage requirements offered on our install requirements page.

Server recommendations#

  • If you are building a new machine to serve as your NFS server:

    • It should have at least 4 cores and 16GiB of RAM.

    • Increase the number of threads created by the NFS daemon to at least 64, to reduce the likelihood of contention between multiple users. For information on how to do this see your operating system documentation; for instance, this RedHat article.

    • If possible, use this file server as your administration server as well. This is a great way to manage and administer this persistence. If this is not possible, make sure to export the volume to the administration server as well as the Kubernetes cluster.

    • If you are intending to use the same server for both anaconda-storage and anaconda-persistence, then you should consolidate to a single PersistentVolume, as discussed in the general storage requirements.

  • The use of premium storage tiers, and SSD-based storage in particular, is strongly recommended.

  • In many environments, the performance of the volume (e.g., IOPS) is tightly coupled to the size of the disk. For this reason, Anaconda recommends over-provisioning the size of the disk to take advantage of this. In some environments, IOPS can be provisioned separately, but it can still be cost-effective to over-provision size instead.

  • Anaconda recommends the use of the async export option.

  • Anaconda recommends against the use of the root_squash option. While a seemingly sensible option for security reasons, in practice we find that it too often leads to unexpected permissions issues. That said, a similar and more reliable option is to use the all_squash option along with anonuid and anonguid. This effectively forces all remote access to be translated to the same UID and GID on the server. In summary, in order of preference, Anaconda recommends:

    • no_root_squash for maximum administration flexibility, and to allow the containers to utilize GID 0, the Kubernetes default.

    • all_squash / anon_uid / anon_gid for a reliable option that avoids UID 0 & GID 0;

    • root_squash only if there is no other alternative.

  • To improve both security and performance, locate the file server on the same private subnet as the Kubernetes cluster, and limit the exports to that subnet.

Client recommendations#

  • When mounting the NFS share, Anaconda recommends overriding the default read and write block sizes by using the options rsize=65536, wsize=65536. The reason smaller block sizes are preferred is because the creation of conda environments frequently involves the manipulation of thousands of smaller files. Large block sizes result in significant inefficiency.

  • We also recommend the use of the noatime option. This eliminates the updating of file access times over NFS, further reducing network overhead. Note that file modification times are still preserved.

Persistent Volume specifications#

Encapsulating the client recommendations into the PersistentVolume and PersistentVolumeClaim specifications is relatively simple.

Begin with the following template, called (for instance) pv.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: <NAME>
  annotations:
    pv.beta.kubernetes.io/gid: "<GID>"
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteMany
  mountOptions:
    - rsize=65536
    - wsize=65536
    - noatime
  nfs:
    server: <ADDRESS>
    path: <PATH>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: <NAME>
spec:
  accessModes:
    - ReadWriteMany
  volumeName: <NAME>
  storageClassName: ""
  resources:
    requests:
      storage: 100Gi

Perform the following replacements:

  • <NAME>: you can give this any name you wish, or adhere to our conventions of anaconda-storage and/or anaconda-persistence. This name will ultimately be supplied to the Helm chart values. Note that <NAME> appears in three places; use the same value for all.

  • <GID>: this is the group ID which has write access to the volume. As discussed above, the recommended value is 0; but if you are forced to use root_squash or all_squash, make sure this has the value of the selected GID. The quotes must be preserved.

  • <ADDRESS>: the FQDN or numeric IP address of the NFS server.

  • <PATH>: the exported path from the NFS server.

The size entry in both resources does not need to be changed, even if your volume is (as is likely) significantly larger. All that matters in this case is that the values are the same.

Once this template is properly populated, you can create the resources with the command:

kubectl create -f pv.yaml

If you have allocated two different NFS volumes for anaconda-storage and anaconda-persistence, repeat this template for each.

Caution

When creating the Persistent Volume (PV) using NFS as the provider, specify your NFS version under mountOptions to avoid performance issues within your Kubernetes platform. For example:

mountOptions:
- hard
- nfsvers=3
- rsize=65536
- wsize=65536
- noatime