Skip to main content

Presto Cluster Overview

The individual Presto cluster screen provides the ability to get the cluster details as well as manage the cluster.

Overview#

For each Presto cluster created, the following resources are provisioned:

  • An Ahana-managed Hive Metastore - This get pre-attached to the cluster via a catalog named ahana_hive. The Hive Metastore is provisioned as a container and provisioned on an EC2 instance of your choice. The Hive Metastore is backed by PostgreSQL, also provisioned by Ahana as a container running on the same instance.

  • A Presto Coordinator - The coordinator is provisioned using an EC2 instance of your choice.

  • Presto Workers - The number of workers and the instance type is selected during cluster creation time. The number of workers can be modified after the cluster has been created.

As a reminder, an instance of Apache Superset has been provisions during the compute plane creation.

Ahana Compute Plane

Presto cluster information#

The first section provides the basic cluster information.

  • Version of Presto - Ahana Cloud is based on PrestoDB. This field shows the version of PrestoDB that the cluster is running on.
  • Number of worker nodes - This shows the current number of workers provisioned. The cluster can be resized.
  • Uptime - This is the duration of time from when the cluster became active last.
  • Cluster instance types - This is the EC2 instance type for each for the cluster resources:
    • The Presto coordinator
    • The Presto workers
    • The Hive metastore (optional)

Basic information about the cluster

Presto cluster storage information#

Each Presto cluster comes pre-attached with a Ahana-managed Hive Metastore. The metastore is a data catalog that allows you to map database concepts like databases, tables and columns to files stored in datalakes like Amazon S3. The metastore is pre-configured to use an Amazon S3 bucket for storage. This makes it incredibly easy to get started with using Presto as a data warehouse. No additional configuration or changes to any config properties is needed.

Architecture for metadata & storage

Ahana-managed Hive Metastore and Amazon S3 storage#

The Ahana-managed Hive Metastore (HMS) provisioned allows you to create managed or internal tables. This means that tables created in Presto using the ahana_hive catalog will be treated as managed tables in Hive. Hive assumes that it owns the data for these managed tables. The Ahana HMS is pre-configured to store managed tables in an Amazon S3 datalake. Each cluster has an Amazon S3 bucket that is pre-created and configured for the HMS. Ahana configures HMS to point to the s3 bucket.

The name of the Amazon S3 bucket is displayed in the Cluster Storage Information section. This bucket is created in your account and you can access the data via the Amazon S3 console. The S3 bucket name will include a shortened name of the cluster that the bucket is attached to. Example: s3a://ahana-cf-eksng-stack-telemetryclu-hmswarehousedir-j123456abcd12.

Amazon S3 information

Example of a managed (internal) table:

CREATE TABLE orders (
orderkey bigint,
orderstatus varchar,
totalprice double,
orderdate date
);
tip

Easily locate all Amazon S3 buckets created by Ahana by searching for the ahana-cf-eksng-stack prefix in your Amazon S3 console.

Ahana configures the Hive metastore to point the hive.metastore.warehouse.dir path property to the S3 bucket created for each cluster. By default, data for managed tables created in this HMS will be stored in a folder path similar to /databasename.db/tablename/ in the S3 bucket. If a managed table or partition is dropped, both - the data and metadata associated with that table or partition are deleted. This means that the files that were stored in S3 will also be deleted.

This is not the case for external tables. Compared with managed (internal) tables, external tables point to a location that is different from the pre-configured storage location. The HMS needs to be able to access the storage location.

You can also create external tables in the Ahana-managed HMS via Presto. This can be via presto-cli or any other tool that connects to Presto. If an external table or partition is dropped, only the metadata associated with that table or partition are deleted from the Ahana-managed HMS.

Example of an external table:

CREATE TABLE airports (
iata varchar,
airport varchar,
city varchar,
state varchar,
country varchar,
lat real,
long real )
WITH (
format = 'PARQUET',
EXTERNAL_LOCATION = 's3a://ahana-test10001/presto/'
);
tip

Learn more about Hive Managed and External tables here

Managing Ahana created Amazon S3 buckets#

Ahana creates one Amazon S3 bucket for each cluster that is pre-configured to be the backend data warehouse storage for the Hive metastore for managed tables.

Presto clusters with S3 buckets

  • Similar to the Ahana-managed Hive metastore, this S3 bucket is maintained across all Presto cluster stops as well as restarts.

  • When a Presto cluster gets deleted, the Ahana-managed Hive metastore, the Presto coordinator and all Presto workers also get deleted. However, given that the Amazon S3 bucket can have important data, the S3 bucket created stays persisted and does not get deleted by Ahana. Users can manage these S3 buckets as needed via the AWS console.

Presto cluster connection details#

The Cluster Connection Details section includes useful information about cluster endpoints and let you change the cluster credentials.

Presto cluster endpoints

The Presto cluster endpoint can be used to connect to the cluster via various tools like the presto-cli.

./presto-cli --server https://my-cluster-1.my-domain.cp.ahana.cloud --user <username> --password

The JDBC connection can be used to connect tools like Tableau, Looker and other reporting and dashboarding tool to the Presto cluster.

tip

Click on the Ahana Copy Button icon to get quickly copy connection information and other details about the cluster.

Apache Superset Information#

The Ahana Compute Plane provisions an instance of Apache Superset, a very popular open source dashboarding tool.

Apache Superset endpoints

note

Apache Superset runs in a container on Amazon EKS on a t3.medium instance. This is meant to be an admin sandbox to test cluster connectivity. For reporting and dashboard, we would recommend connecting your own tools like Tableau, Looker, Preset and others to Presto via the JDBC driver.

Accessing Apache Superset#

You can access Apache Superset by clicking on the Open link Button and entering your Ahana username and password. The Superset instance is at the compute plane level so it can connect to all Presto clusters in this compute plane. Apache Supserset Login

Apache Supserset Login

Presto cluster data sources#

This section shows the attached data sources and catalogs.

Attached data sources

Data sources attached to the cluster can be changed by clicking on the Change data sources button. Visit the Data sources in Ahana Cloud page for more details about managing data sources.