How to Write a Python Script to Connect to a Presto Cluster
If you need a quick tutorial on how to write a python script to connect to a Presto cluster and run a query, you came to the right place.
Prerequisites
In this tutorial we’ll be using a Python3 environment, and the SQLAlchemy. SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL.
- You will need a working python installation – check that you have with:
$ python3 --version
Python 3.9.1
With Python 3.4 or later you will also have pip3 installer, which we’ll use to install two modules.
2. Install the sqlalchemy module using pip3:
$ pip3 install SQLAlchemy
Collecting SQLAlchemy
Downloading SQLAlchemy-1.3.23-cp39-cp39-macosx_10_14_x86_64.whl (1.2 MB)
Installing collected packages: SQLAlchemy
Successfully installed SQLAlchemy-1.3.23
3. Install the pyhive interface for Presto. PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive. We want the Presto interface:
$ pip3 install 'pyhive[presto]'
Collecting pyhive[presto]
...
Successfully built pyhive future
Installing collected packages: six, urllib3, python-dateutil, idna, future, chardet, certifi, requests, pyhive
Successfully installed certifi-2020.12.5 chardet-4.0.0 future-0.18.2 idna-2.10 pyhive-0.6.3 python-dateutil-2.8.1 requests-2.25.1 six-1.15.0 urllib3-1.26.3
Test Script
Now you have python, SQLAchemy and PyHive ready to roll, create this sample script and save it as presto-test.py. In it we’ll run a simple SELECT query that just returns the Presto cluster’s system timestamp. Just replace the user ID, password, cluster name, and metastore (e.g. the Ahana integrated ahana_hive in this case) with your own. Here we’re using an HTTPS connection so the default port number is 443:
# presto-test.py
# simple python script to run a query on a presto cluster and display the result set
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine("presto://user:password@myPrestoClusterName:443/ahana_hive", connect_args={'protocol': 'https'})
with engine.connect() as con:
rs = con.execute('SELECT now()')
for row in rs:
print(row)
Invocation
The script runs the SELECT statement and displays the result set which is a single row:
$ python3 presto-test.py
('2021-02-10 13:16:47.247 UTC',)
There are other methods one can use, like the Presto python client which can be found at https://github.com/prestodb/presto-python-client
We hope you enjoyed this Presto & Python tutorial. All the above information works perfectly with Ahana Cloud for Presto too, the fully integrated, cloud-native, managed service for AWS.