
This blog post will provide you with an understanding of how to connect Superset to Presto.
TL;DR
Superset refers to a connection to a distinct data source as a database. A single Presto cluster can connect to multiple data sources by configuring a Presto catalog for each desired data source. Hence, to make a Superset database connection to a particular data source through Presto, you must specify the Presto cluster and catalog in the SQLAlchemy URI as follows: presto://<presto-username>:<presto-password>@<presto-coordinator-url>:<http-server-port>/<catalog>
.
Superset and SQLAlchemy
Superset is built as a Python Flask web application and leverages SQLAlchemy, a Python SQL toolkit, to provide a consistent abstraction layer to relational data sources. Superset uses a consistent SQLAlchemy URI as a connection string for a defined Superset database. The schema for the URI is as follows: dialect+driver://username:password@host:port/database
. We will deconstruct the dialect
, driver
, and database
in the following sections.

SQLAlchemy defines a dialect as the system it uses to communicate with the specifics various databases (e.g. flavor of SQL) and DB-API, low level Python APIs to talk to specific relational data sources. A Python DB-API database driver is required for a given data source. For example, PyHive is a DB-API driver to connect to Presto. It is possible for a single dialect to choose between multiple DB-API drivers. For example, PostgreSQL dialect can support the following DB-API drivers: psycopg2
, pg8000
, psycop2cffi
, an pygresql
. Typically, a single DB-API driver is set as the default for a dialect and used when no explicit DB-API is specified. For PostgreSQL, the default DB-API driver is psycopg2
.
The term database
can be confusing since it is heavily loaded. In a typical scenario a given data source, such as PostgeSQL, have multiple logical groupings of tables which are called “databases”. In a way, these “databases” provide namespaces for tables; identically named tables can exist in two different “databases” without collision. As an example, we can use the PostgreSQL instance available when locally installing Superset with Docker Compose.
In this instance of PostgreSQL, we have four databases: postgres
, superset
, template0
, and template1
.
superset@localhost:superset> \\l
+-----------+----------+----------+------------+------------+-----------------------+
| Name | Owner | Encoding | Collate | Ctype | Access privileges |
|-----------+----------+----------+------------+------------+-----------------------|
| postgres | superset | UTF8 | en_US.utf8 | en_US.utf8 | <null> |
| superset | superset | UTF8 | en_US.utf8 | en_US.utf8 | <null> |
| template0 | superset | UTF8 | en_US.utf8 | en_US.utf8 | =c/superset |
| | | | | | superset=CTc/superset |
| template1 | superset | UTF8 | en_US.utf8 | en_US.utf8 | =c/superset |
| | | | | | superset=CTc/superset |
+-----------+----------+----------+------------+------------+-----------------------+
We can look into the superset
database and see the tables in that database.
The key thing to remember here is that ultimately a Superset database needs to resolve to a collection of tables, whatever that is referred to in a particular dialect.
superset@localhost:superset> \c superset
You are now connected to database "superset" as user "superset"
+--------+----------------------------+-------+----------+
| Schema | Name | Type | Owner |
|--------+----------------------------+-------+----------|
| public | Clean | table | superset |
| public | FCC 2018 Survey | table | superset |
| public | ab_permission | table | superset |
| public | ab_permission_view | table | superset |
| public | ab_permission_view_role | table | superset |
| public | ab_register_user | table | superset |
| public | ab_role | table | superset |
| public | ab_user | table | superset |
| public | ab_user_role | table | superset |
| public | ab_view_menu | table | superset |
| public | access_request | table | superset |
| public | alembic_version | table | superset |
| public | alert_logs | table | superset |
| public | alert_owner | table | superset |
| public | alerts | table | superset |
| public | annotation | table | superset |
| public | annotation_layer | table | superset |
| public | bart_lines | table | superset |
| public | birth_france_by_region | table | superset |
| public | birth_names | table | superset |
| public | cache_keys | table | superset |
| public | channel_members | table | superset |
| public | channels | table | superset |
| public | cleaned_sales_data | table | superset |
| public | clusters | table | superset |
| public | columns | table | superset |
| public | covid_vaccines | table | superset |
:
With an understanding of dialects, drivers, and databases under our belt, let’s solidify it with a few examples. Let’s assume we want to create a Superset database to a PostgreSQL data source and particular PostgreSQL database named mydatabase
. Our PostgreSQL data source is hosted at pghost
on port 5432
and we will log in as sonny
(password is foobar
). Here are three SQLAlchemy URIs we could use (actually inspired from the SQLAlchemy documentation):
postgresql+psycopg2://sonny:foobar@pghost:5432/mydatabase
We explicitly specify thepostgresql
dialect andpsycopg2
driver.postgresql+pg8000://sonny:foobar@pghost:5432/mydatabase
We use thepg8000
driver.postgresql://sonny:foobar@pghost:5432/mydatabase
We do not explicitly list any driver, and hence, SQLAlchemy will use the default driver, which ispsycopg2
forpostgresql
.
Superset lists its recommended Python packages for database drivers in the public documentation.
Presto Catalogs
Because Presto can connect to multiple data sources, when connecting to Presto as a defined Superset database, it’s important to understand what you are actually making a connection to.
In Presto, the equivalent notion of a “database” (i.e. logical collection of tables) is called a schema. Access to a specific schema (“database”) in a data source, is defined in a catalog.
As an example, the listing below is the equivalent catalog configuration to connect to the example mydatabase
PostgreSQL database we described previously. If we were querying a table in that catalog directly from Presto, a fully-qualified table would be specified as catalog.schema.table
(e.g. select * from catalog.schema.table
). Hence, querying the Clean
table would be select * from postgresql.mydatabase.Clean
.
connector.name=postgresql
connection-url=jdbc:postgresql://pghost:5432/mydatabase
connection-user=sonny
connection-password=foobar

Superset to Presto
Going back to Superset, to create a Superset database to connect to Presto, we specify the Presto dialect. However, because Presto is the intermediary to an underlying data source, such as PostgreSQL, the username
and password
we need to provide (and authenticate against) is the Presto username and password. Further, we must specify a Presto catalog for the database
in the SQLAlchemy URI. From there, Presto—-through its catalog configuration—-authenticates to the backing data source with the appropriate credentials (e.g sonny
and foobar
). Hence, the SQLAlchemy URI to connect to Presto in Superset is as follows: presto://<presto-username>:<presto-password>@<presto-coordinator-url>:<http-server-port>/<catalog>

The http-server-port
refers to the http-server.http.port
configuration on the coordinator and workers (see Presto config properties); it is usually set to 8080.
New Superset Database Connection UI
In Superset 1.3, there is a feature-flagged version of a new database connection UI that simplifies connecting to data without constructing the SQLAlchemy URI. The new database connection UI can be turned on in config.py
with FORCE_DATABASE_CONNECTIONS_SSL = True
(PR #14934). The new UI can also be viewed in the Superset documentation.
Try It Out!
In less than 30 minutes, you can get up and running using Superset with a Presto cluster with Ahana Cloud for Presto. Ahana Cloud for Presto is an easy-to-use fully managed Presto service that also automatically stands up a Superset instance for you. It’s free to try out for 14 days, then it’s pay-as-you-go through the AWS marketplace.