How to Write a Python Script to Connect to a Presto Cluster

If you need a quick tutorial on how to write a python script to connect to a Presto cluster and run a query, you came to the right place.

Prerequisites

In this tutorial we’ll be using a Python3 environment, and the SQLAlchemy.  SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL. 

  1. You will need a working python installation – check that you have with:
$ python3 --version
Python 3.9.1

With Python 3.4 or later you will also have pip3 installer, which we’ll use to install two modules.

2. Install the sqlalchemy module using pip3:

$ pip3 install SQLAlchemy
Collecting SQLAlchemy
  Downloading SQLAlchemy-1.3.23-cp39-cp39-macosx_10_14_x86_64.whl (1.2 MB)
Installing collected packages: SQLAlchemy
Successfully installed SQLAlchemy-1.3.23

3. Install the pyhive interface for Presto. PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive. We want the Presto interface:

$ pip3 install 'pyhive[presto]'
Collecting pyhive[presto]
 ...
Successfully built pyhive future
Installing collected packages: six, urllib3, python-dateutil, idna, future, chardet, certifi, requests, pyhive
Successfully installed certifi-2020.12.5 chardet-4.0.0 future-0.18.2 idna-2.10 pyhive-0.6.3 python-dateutil-2.8.1 requests-2.25.1 six-1.15.0 urllib3-1.26.3

Test Script

Now you have python, SQLAchemy and PyHive ready to roll, create this sample script and save it as presto-test.py.  In it we’ll run a simple SELECT query that just returns the Presto cluster’s system timestamp.  Just replace the  user ID, password, cluster name, and metastore (e.g. the Ahana integrated ahana_hive in this case) with your own. Here we’re using an HTTPS connection so the default port number is 443:

# presto-test.py
# simple python script to run a query on a presto cluster and display the result set 
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine("presto://user:password@myPrestoClusterName:443/ahana_hive", connect_args={'protocol': 'https'})
 
with engine.connect() as con:
    rs = con.execute('SELECT now()')
    for row in rs:
        print(row)

Invocation

The script runs the SELECT statement and displays the result set which is a single row:

$ python3 presto-test.py
('2021-02-10 13:16:47.247 UTC',)

There are other methods one can use, like the Presto python client which can be found at https://github.com/prestodb/presto-python-client 

We hope you enjoyed this Presto & Python tutorial.  All the above information works perfectly with Ahana Cloud for Presto too, the fully integrated, cloud-native, managed service for AWS.