Connecting to Vertica#

Data Science & AI Workbench enables you to easily connect to HP Vertica, a distributed analytics database that stores data in column format.

Before you can do so, you’ll need to conda install the vertica-python package, which contains the Python adapter for Vertica databases:

conda install -c conda-forge vertica-python

NOTE: Any packages you install from the command line are available during the current session only. If you want them to persist, add them to the project’s anaconda-project.yml file. For more information, see Developing a project.

You can then use code such as this to import the library and connect to Vertica from within a notebook session:

import vertica_python
import json

# Get credentials from Kubernetes. The credentials were setup as a dictionary
credentials = None
with open('/var/run/secrets/user_credentials/vertica_credentials') as f:
    credentials = json.load(f)

# Verify the credentials were pulled correctly
if credentials:
    # Set up connection dictionary based on secrets obtained above
    connection_info = {
        'host': credentials.get('hostname'),
        'port': 5433,
        'user': credentials.get('username'),
        'password': credentials.get('password'),
        'database': 'VMart'
    }

    # Use a with statement that auto closes the connection once out of the loop
    with vertica_python.connect(**connection_info) as connection:
        cursor = connection.cursor()

        # Set up example query for the sample dataset that is being used
        example_query = (
            "SELECT DISTINCT s.product_key, p.product_description "
            "FROM store.store_sales_fact s, public.product_dimension p "
            "WHERE s.product_key = p.product_key "
            "AND s.product_version = p.product_version AND s.store_key IN ("
            "  SELECT store_key "
            "  FROM store.store_dimension "
            "  WHERE store_state = 'MA') "
            "ORDER BY s.product_key LIMIT 50;"
        )
        # Execute the query
        cursor.execute(example_query)
        # Iterate through the cursor and print each row
        for row in cursor.iterate():
            print(row)