The individual Presto cluster screen provides the ability to get the cluster details as well as manage the cluster.
For each Presto cluster created, the following resources are provisioned:
An Ahana-managed Hive Metastore - This get pre-attached to the cluster via a catalog named
ahana_hive. The Hive Metastore is provisioned as a container and provisioned on an EC2 instance of your choice. The Hive Metastore is backed by PostgreSQL, also provisioned by Ahana as a container running on the same instance.
A Presto Coordinator - The coordinator is provisioned using an EC2 instance of your choice.
Presto Workers - The number of workers and the instance type is selected during cluster creation time. The number of workers can be modified after the cluster has been created.
As a reminder, an instance of Apache Superset has been provisions during the compute plane creation.
The first section provides the basic cluster information.
- Version of Presto - Ahana Cloud is based on PrestoDB. This field shows the version of PrestoDB that the cluster is running on.
- Number of worker nodes - This shows the current number of workers provisioned. The cluster can be resized.
- Uptime - This is the duration of time from when the cluster became active last.
- Cluster instance types - This is the EC2 instance type for each for the cluster resources:
- The Presto coordinator
- The Presto workers
- The Hive metastore (optional)
Each Presto cluster comes pre-attached with a Ahana-managed Hive Metastore. The metastore is a data catalog that allows you to map database concepts like databases, tables and columns to files stored in datalakes like Amazon S3. The metastore is pre-configured to use an Amazon S3 bucket for storage. This makes it incredibly easy to get started with using Presto as a data warehouse. No additional configuration or changes to any config properties is needed.
The Ahana-managed Hive Metastore (HMS) provisioned allows you to create managed or internal tables. This means that tables created in Presto using the
ahana_hive catalog will be treated as managed tables in Hive. Hive assumes that it owns the data for these managed tables. The Ahana HMS is pre-configured to store managed tables in an Amazon S3 datalake. Each cluster has an Amazon S3 bucket that is pre-created and configured for the HMS. Ahana configures HMS to point to the s3 bucket.
The name of the Amazon S3 bucket is displayed in the Cluster Storage Information section. This bucket is created in your account and you can access the data via the Amazon S3 console. The S3 bucket name will include a shortened name of the cluster that the bucket is attached to. Example:
Example of a managed (internal) table:
Easily locate all Amazon S3 buckets created by Ahana by searching for the
ahana-cf-eksng-stack prefix in your Amazon S3 console.
Ahana configures the Hive metastore to point the
hive.metastore.warehouse.dir path property to the S3 bucket created for each cluster. By default, data for managed tables created in this HMS will be stored in a folder path similar to /databasename.db/tablename/ in the S3 bucket. If a managed table or partition is dropped, both - the data and metadata associated with that table or partition are deleted. This means that the files that were stored in S3 will also be deleted.
This is not the case for external tables. Compared with managed (internal) tables, external tables point to a location that is different from the pre-configured storage location. The HMS needs to be able to access the storage location.
You can also create external tables in the Ahana-managed HMS via Presto. This can be via presto-cli or any other tool that connects to Presto. If an external table or partition is dropped, only the metadata associated with that table or partition are deleted from the Ahana-managed HMS.
Example of an external table:
Learn more about Hive Managed and External tables here
Ahana creates one Amazon S3 bucket for each cluster that is pre-configured to be the backend data warehouse storage for the Hive metastore for managed tables.
Similar to the Ahana-managed Hive metastore, this S3 bucket is maintained across all Presto cluster stops as well as restarts.
When a Presto cluster gets deleted, the Ahana-managed Hive metastore, the Presto coordinator and all Presto workers also get deleted. However, given that the Amazon S3 bucket can have important data, the S3 bucket created stays persisted and does not get deleted by Ahana. Users can manage these S3 buckets as needed via the AWS console.
The Cluster Connection Details section includes useful information about cluster endpoints and let you change the cluster credentials.
The Presto cluster endpoint can be used to connect to the cluster via various tools like the
The JDBC connection can be used to connect tools like Tableau, Looker and other reporting and dashboarding tool to the Presto cluster.
Click on the icon to get quickly copy connection information and other details about the cluster.
The Ahana Compute Plane provisions an instance of Apache Superset, a very popular open source dashboarding tool.
Apache Superset runs in a container on Amazon EKS on a t3.medium instance. This is meant to be an admin sandbox to test cluster connectivity. For reporting and dashboard, we would recommend connecting your own tools like Tableau, Looker, Preset and others to Presto via the JDBC driver.
You can access Apache Superset by clicking on the and entering your Ahana username and password. The Superset instance is at the compute plane level so it can connect to all Presto clusters in this compute plane.
This section shows the attached data sources and catalogs.
Data sources attached to the cluster can be changed by clicking on the Change data sources button. Visit the Data sources in Ahana Cloud page for more details about managing data sources.