NFS Storage Recommendations#
A common mechanism for provisioning the storage required for Data Science & AI Workbench persistence involves the use of a Network File Server (NFS). This includes many cloud offerings such as Amazon EFS and Google Filestore and many on-premise NAS/SAN implementations. In this section, we provide specific recommendations for server- and client-side configuration. These recommendations should be used to augment the general storage requirements offered on our install requirements page.
Server recommendations#
If you are building a new machine to serve as your NFS server:
It should have at least 4 cores and 16GiB of RAM.
Increase the number of threads created by the NFS daemon to at least 64, to reduce the likelihood of contention between multiple users. For information on how to do this see your operating system documentation; for instance, this RedHat article.
If possible, use this file server as your administration server as well. This is a great way to manage and administer this persistence. If this is not possible, make sure to export the volume to the administration server as well as the Kubernetes cluster.
If you are intending to use the same server for both
anaconda-storage
andanaconda-persistence
, then you should consolidate to a single PersistentVolume, as discussed in the general storage requirements.
The use of premium storage tiers, and SSD-based storage in particular, is strongly recommended.
In many environments, the performance of the volume (e.g., IOPS) is tightly coupled to the size of the disk. For this reason, Anaconda recommends over-provisioning the size of the disk to take advantage of this. In some environments, IOPS can be provisioned separately, but it can still be cost-effective to over-provision size instead.
Anaconda recommends the use of the
async
export option.Anaconda recommends against the use of the
root_squash
option. While a seemingly sensible option for security reasons, in practice we find that it too often leads to unexpected permissions issues. That said, a similar and more reliable option is to use theall_squash
option along withanonuid
andanonguid
. This effectively forces all remote access to be translated to the same UID and GID on the server. In summary, in order of preference, Anaconda recommends:no_root_squash
for maximum administration flexibility, and to allow the containers to utilize GID 0, the Kubernetes default.all_squash
/anon_uid
/anon_gid
for a reliable option that avoids UID 0 & GID 0;root_squash
only if there is no other alternative.
To improve both security and performance, locate the file server on the same private subnet as the Kubernetes cluster, and limit the exports to that subnet.
Client recommendations#
When mounting the NFS share, Anaconda recommends overriding the default read and write block sizes by using the options
rsize=65536
,wsize=65536
. The reason smaller block sizes are preferred is because the creation of conda environments frequently involves the manipulation of thousands of smaller files. Large block sizes result in significant inefficiency.We also recommend the use of the
noatime
option. This eliminates the updating of file access times over NFS, further reducing network overhead. Note that file modification times are still preserved.
Persistent Volume specifications#
Encapsulating the client recommendations into the PersistentVolume
and PersistentVolumeClaim
specifications is relatively simple.
Begin with the following template, called (for instance) pv.yaml
:
apiVersion: v1
kind: PersistentVolume
metadata:
name: <NAME>
annotations:
pv.beta.kubernetes.io/gid: "<GID>"
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteMany
mountOptions:
- rsize=65536
- wsize=65536
- noatime
nfs:
server: <ADDRESS>
path: <PATH>
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: <NAME>
spec:
accessModes:
- ReadWriteMany
volumeName: <NAME>
storageClassName: ""
resources:
requests:
storage: 100Gi
Perform the following replacements:
<NAME>
: you can give this any name you wish, or adhere to our conventions ofanaconda-storage
and/oranaconda-persistence
. This name will ultimately be supplied to the Helm chart values. Note that<NAME>
appears in three places; use the same value for all.<GID>
: this is the group ID which has write access to the volume. As discussed above, the recommended value is0
; but if you are forced to useroot_squash
orall_squash
, make sure this has the value of the selected GID. The quotes must be preserved.<ADDRESS>
: the FQDN or numeric IP address of the NFS server.<PATH>
: the exported path from the NFS server.
The size
entry in both resources does not need to be changed, even if
your volume is (as is likely) significantly larger. All that matters in this
case is that the values are the same.
Once this template is properly populated, you can create the resources with the command:
kubectl create -f pv.yaml
If you have allocated two different NFS volumes for anaconda-storage
and anaconda-persistence
, repeat this template for each.
Caution
When creating the Persistent Volume (PV) using NFS as the provider, specify your NFS version under mountOptions
to avoid performance issues within your Kubernetes platform. For example:
mountOptions:
- hard
- nfsvers=3
- rsize=65536
- wsize=65536
- noatime