What’s the advantage of having your own Hive metastore with Presto? How does it compare to Amazon Glue?

First let’s define what Apache Hive is versus Amazon Glue. Apache Hive reads, writes, and manages large datasets using SQL. Hive was built for Hadoop. AWS Glue is a fully managed ETL service for preparing and loading data for analytics. It automates ETL and handles the schemas and transformations. AWS Glue is serverless, so there’s no infrastructure needed to provision or manage it (you only pay for the resources used while your jobs are running).

Presto isn’t a database and does not come with a catalog, so you’d want to use Hive to read/write/manage your datasets. Presto abstracts a catalog like Hive underneath it. You can use the Glue catalog as the default Hive metastore for Presto.

With Ahana Cloud, you don’t really need to worry about integrating Hive and/or AWS Glue with Presto. Presto clusters created with Ahana come with a managed Hive metastore and pre-integrated Amazon S3 data lake bucket. Ahana takes care connecting external catalogs like Hive and Amazon Glue, so you can focus more on analytics and less on integrating your catalogs manually. You can also create managed tables as opposed to external tables.