Connecting to MongoDB#

Data Science & AI Workbench enables you to connect to a MongoDB database to work with data in its document-oriented store.

To access MongoDB while in a notebook session, you’ll need to conda install the Python driver using the following command:

conda install -c anaconda pymongo

NOTE: Any packages you install from the command line are available during the current session only. If you want them to persist, add them to the project’s anaconda-project.yml file. For more information, see Developing a project.

After you’ve installed the pymongo driver, you can use code such as this to access MongoDB from within a notebook session:

To connect to the MongoDB server:

import pymongo
import json

NOTE: This example uses the SCRAM-SHA-256 default authentication mechanism. You’ll need to change this if you are using a different mechanism. See the PyMongo documentation for additional authentication examples.

If you need to connect to a replica set, see these examples.

Now that you’ve connected to MongoDB, you can use code such as this to access it from within a notebook session:

"""
Get credentials from Kubernetes. The credentials were set up as a dictionary. For example:
{
    "username": "USERNAME",
    "password": "PASSWORD"
}
"""
credentials = None
with open('/var/run/secrets/user_credentials/mongo_credentials') as f:
    credentials = json.load(f)

# Verify the credentials were pulled correctly
if credentials:
    # Connect to Mongo
    client = pymongo.MongoClient(
        'support-mongo.dev.anaconda.com',
        username=credentials.get('username'),
        password=credentials.get('password'),
        authSource='test'
    )

    # Get the database you want to access
    db = client.get_database('test')

    # Query a collection for a single record and print it out
    single_result = db.zipcodes.find_one()
    print(json.dumps(single_result, indent=4))

    # Query a collection for multiple records and loop through them and print
    multiple_results = db.zipcodes.find().limit(10)
    for item in multiple_results:
        print(json.dumps(item, indent=4))
else:
    # If credentials were not pulled properly print an error
    print(
        'Could not get credentials. Ensure that you setup your '
        'users credentials for this database server'
    )

See Storing secrets for information about adding credentials to the platform, to make them available in your projects. Any secrets you add will be available across all sessions and deployments associated with your user account.