Connecting to S3#

Data Science & AI Workbench enables you to connect to Amazon Simple Storage Service (S3) object storage service, to access data stored there.

Before you can do so, however, you’ll need to install the s3fs package, which contains the Python filesystem interface required to connect to S3:

conda install -c anaconda s3fs

Note

Any packages you install from the command line are available during the current session only. If you want them to persist, add them to the project’s anaconda-project.yml file. For more information, see Project configurations.

You can then use code such as this to access a specific S3 bucket from within a notebook session:

from s3fs.core import S3FileSystem

import configparser

Credentials would need to be in .ini format and look like the following:

[default]
aws_access_key_id=ACCESS_KEY_HERE
aws_secret_access_key=SECRET_ACCESS_KEY_HERE
# Configparser is in the standard library, and can be used to read the .ini file so that you can use it to set up the S3 object below.

config = configparser.ConfigParser()
config.read('/var/run/secrets/user_credentials/aws_credentials')

# Set up the object using the credentials from the file
fs = S3FileSystem(
    anon=False,
    key=config.get('default', 'aws_access_key_id'),
    secret=config.get('default', 'aws_secret_access_key')
 )

# Provide the bucket and file name
bucket = 'test_bucket'
file_name = 'testing.txt'

# To list what is in the specified bucket
all_files = fs.ls(f'{bucket}')
print(all_files)

# To read the specified file in the named bucket
with fs.open(f'{bucket}/{file_name}', mode='rb') as f:
    print(f.read())

See Secrets for information about adding credentials to the platform, to make them available in your projects. Any secrets you add will be available across all sessions and deployments associated with your user account.