Connecting to Vertica#
Data Science & AI Workbench enables you to easily connect to HP Vertica, a distributed analytics database that stores data in column format.
Before you can do so, you’ll need to conda install the vertica-python
package, which contains the Python adapter for Vertica databases:
conda install -c conda-forge vertica-python
Note
Any packages you install from the command line are available during the current session only. If you want them to persist, add them to the project’s anaconda-project.yml
file. For more information, see Project configurations.
You can then use code such as this to import the library and connect to Vertica from within a notebook session:
import vertica_python
import json
# Get credentials from Kubernetes. The credentials were setup as a dictionary
credentials = None
with open('/var/run/secrets/user_credentials/vertica_credentials') as f:
credentials = json.load(f)
# Verify the credentials were pulled correctly
if credentials:
# Set up connection dictionary based on secrets obtained above
connection_info = {
'host': credentials.get('hostname'),
'port': 5433,
'user': credentials.get('username'),
'password': credentials.get('password'),
'database': 'VMart'
}
# Use a with statement that auto closes the connection once out of the loop
with vertica_python.connect(**connection_info) as connection:
cursor = connection.cursor()
# Set up example query for the sample dataset that is being used
example_query = (
"SELECT DISTINCT s.product_key, p.product_description "
"FROM store.store_sales_fact s, public.product_dimension p "
"WHERE s.product_key = p.product_key "
"AND s.product_version = p.product_version AND s.store_key IN ("
" SELECT store_key "
" FROM store.store_dimension "
" WHERE store_state = 'MA') "
"ORDER BY s.product_key LIMIT 50;"
)
# Execute the query
cursor.execute(example_query)
# Iterate through the cursor and print each row
for row in cursor.iterate():
print(row)