Skip to main content

Add a Glue Catalog for S3

Ahana Cloud supports external catalogs that are user managed. Amazon Glue is a popular service on AWS that includes the Glue data catalog that manages metadata for structured data stored in an Amazon S3 Datalake.

This page walks you through adding an Amazon Glue Data Catalog to Ahana Cloud for Presto

Add data source screen#

Start by clicking the Add a data source button on the Data sources tab.

Ahana Data Sources for Presto

Select Amazon Glue Data Source#

Configure AWS Glue for Presto

Configure Amazon Glue Data Source general information#

  • Give your Amazon Glue catalog a name - This will be used to derive the catalog name used for Presto. The data sources screen will show you both the name given and the catalog name used. This is done to remove all special characters that may not be Presto-friendly.
  • Give a description about the Glue catalog being added

Configure AWS Glue for Presto

Configure Amazon Glue Data Source access details#

  • Select the AWS Region that the Amazon Glue catalog you want to add resides in
  • Provide a Role ARN that has access to the Amazon S3 access to you want to access and query with Presto
  • Provide a Role ARN that has access to the Amazon Glue service you want to use for the metadata and schemas.
note

You can provide the same AWS IAM Role ARN for Amazon Glue and Amazon S3, as long as the role has access to both services.

Configure AWS Glue for Presto

tip

The Catalog Name table on the Data screen gives you the internal name used for Presto.

Ahana Data Sources for Presto

Configure AWS Roles to grant Ahana Presto access to Amazon Glue and Amazon S3#

The Presto clusters running in the Ahana Compute Plane need access to your Amazon Glue catalog for the metadata as well as your Amazon S3 buckets for the data.

Ahana uses named Amazon IAM Roles with Presto. Even though the compute plane and the Presto clusters are deployed in your account, AWS still requires that you grant the role being used access to Amazon Glue and Amazon S3.

You can use the same role or different roles for Amazon Glue and Amazon S3.

  • Go to the Amazon IAM Roles page on the AWS console.

  • Select your Glue role that you configured in Ahana's "Glue Role ARN" field (see above), and then click Attach Policies. If you don't want to use an existing role, you can also create a new role.

  • Filter on S3 and select AmazonS3FullAccess from the list of policies

  • Next filter on Glue and select AWSGlueConsoleFullAccess from the list of policies

  • Attach both these policies

  • Next go to Trust Relationships and click on Edit trust relationship

  • Copy paste the following into the JSON Editor. This gives the role ability to assume role so that any Ahana Presto Cluster can access Amazon Glue or Amazon S3. If you want to grant access to only certain Ahana Presto Clusters, see the section below.

important

Remember to replace <accountNumber> in the JSON below with your AWS account number

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<accountNumber>:root"
},
"Action": "sts:AssumeRole"
}
]
}

Restrict access to Amazon Glue and Amazon S3 for only specific clusters#

  • To restrict access to Amazon Glue and Amazon S3 to only certain clusters, you can provide the ARN for the nodeInstance role that Ahana creates for each cluster.

  • Select the role you configured into Ahana for Amazon Glue or Amazon S3 and go to Trust Relationships and click on Edit trust relationship

    • Copy paste the following into the JSON Editor. This gives the role ability to assume role so that any Ahana Presto Cluster can access Amazon Glue or Amazon S3. If you want to grant access to only certain Ahana Presto Clusters, see the section below. Use this role ARN in the JSON policy for the Role you
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<YOUR Ahana Cluster Node instance ARN >:root"
},
"Action": "sts:AssumeRole"
}
]
}
  • To find the ARN to replace <YOUR Ahana Cluster Node instance ARN > in the JSON above, you can find the role by filtering on AHANA-CF-EKSNG-STACK- on the Roles screen.

  • Go to that role for the cluster you want to grant access to Amazon Glue and copy the role ARN for the cluster and paste it into the JSON.

Example: arn:aws:iam::123456789000:role/AHANA-CF-EKSNG-STACK-YOUR-CLUSTER-NAME-NodeInstanceRole-SOME-HASH

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789000:role/AHANA-CF-EKSNG-STACK-YOUR-CLUSTER-NAME-NodeInstanceRole-SOME-HASH"
},
"Action": "sts:AssumeRole"
}
]
}
tip

The Amazon Glue Data Catalog can be populated by defining data bases and tables or by using the crawler. Read more here