Ahana Cloud supports external catalogs that are user managed. Amazon Glue is a popular service on AWS that includes the Glue data catalog that manages metadata for structured data stored in an Amazon S3 Datalake.
This page walks you through adding an Amazon Glue Data Catalog to Ahana Cloud for Presto
Start by clicking the Add a data source button on the Data sources tab.
- Give your Amazon Glue catalog a name - This will be used to derive the catalog name used for Presto. The data sources screen will show you both the name given and the catalog name used. This is done to remove all special characters that may not be Presto-friendly.
- Give a description about the Glue catalog being added
- Select the AWS Region that the Amazon Glue catalog you want to add resides in
- Provide a Role ARN that has access to the Amazon S3 access to you want to access and query with Presto
- Provide a Role ARN that has access to the Amazon Glue service you want to use for the metadata and schemas.
You can provide the same AWS IAM Role ARN for Amazon Glue and Amazon S3, as long as the role has access to both services.
The Catalog Name table on the Data screen gives you the internal name used for Presto.
The Presto clusters running in the Ahana Compute Plane need access to your Amazon Glue catalog for the metadata as well as your Amazon S3 buckets for the data.
Ahana uses named Amazon IAM Roles with Presto. Even though the compute plane and the Presto clusters are deployed in your account, AWS still requires that you grant the role being used access to Amazon Glue and Amazon S3.
You can use the same role or different roles for Amazon Glue and Amazon S3.
Go to the Amazon IAM Roles page on the AWS console.
Select your Glue role that you configured in Ahana's "Glue Role ARN" field (see above), and then click
Attach Policies. If you don't want to use an existing role, you can also create a new role.
Filter on S3 and select
AmazonS3FullAccessfrom the list of policies
Next filter on Glue and select
AWSGlueConsoleFullAccessfrom the list of policies
Attach both these policies
Next go to Trust Relationships and click on
Edit trust relationship
Copy paste the following into the JSON Editor. This gives the role ability to assume role so that any Ahana Presto Cluster can access Amazon Glue or Amazon S3. If you want to grant access to only certain Ahana Presto Clusters, see the section below.
Remember to replace
<accountNumber> in the JSON below with your AWS account number
To restrict access to Amazon Glue and Amazon S3 to only certain clusters, you can provide the ARN for the nodeInstance role that Ahana creates for each cluster.
Select the role you configured into Ahana for Amazon Glue or Amazon S3 and go to Trust Relationships and click on
Edit trust relationship
- Copy paste the following into the JSON Editor. This gives the role ability to assume role so that any Ahana Presto Cluster can access Amazon Glue or Amazon S3. If you want to grant access to only certain Ahana Presto Clusters, see the section below. Use this role ARN in the JSON policy for the Role you
- To find the ARN to replace
<YOUR Ahana Cluster Node instance ARN >in the JSON above, you can find the role by filtering on
AHANA-CF-EKSNG-STACK-on the Roles screen.
- Go to that role for the cluster you want to grant access to Amazon Glue and copy the role ARN for the cluster and paste it into the JSON.
The Amazon Glue Data Catalog can be populated by defining data bases and tables or by using the crawler. Read more here