What are the operational benefits of using a managed service for Presto with Ahana Cloud?

First let’s hear from an AWS Solution Architect: “Ahana Cloud uses the best practices of both a SaaS provider and somebody who would build it themselves on-premises. So the advantage with the Ahana Cloud is that Ahana is really doing all the heavy lifting, and really making it a fully managed service, the customer of Ahana does not have to do a lot of work, everything is spun up through cloud formation scripts that uses Amazon EKS, which is our Kubernetes Container Service. The customer really doesn’t have to worry about that. It’s all under the covers that runs in the background. There’s no active management required of Kubernetes or EKS. And then everything is deployed within your VPC. So the VPC is the logical and the security boundary within your account. And you can control all the egress and ingress into that VPC. So you, as the customer, have full control and the biggest advantage is that you’re not moving your data. So unlike some SaaS partners, where you’re required to push that data or cache that data on their side in their account, with the Ahana Cloud, your data never leaves your account, so your data remains local to your location. Now, obviously, with federated queries, you can also query data that’s outside of AWS. But for data that resides on AWS, you don’t have to push that to your SaaS provider.”

Now that you have that context, lets get more specific, let’s say you want to create a cluster initially, it’s a just a couple of clicks with Ahana Cloud. You can pick the the coordinator instance type and the Hive metastore instance type and it is all flexible. Instead of using the Ahana-provided Hive metastore, you can bring your own Amazon Glue catalog. Then of course its easy to add data sources. For that, you can add in JDBC endpoints for your databases. Ahana has those integrated in and then Ahana Cloud automatically restarts the cluster.

Compared to EMR or if you’re running with other distributions, all of this has to be done manually:

  • you have to create a catalog properties file for each data source
  • restart the cluster on your own
  • scale the cluster manually
  • add your own query logs and statistic
  • rebuild everything when you stop and restart clusters

With Ahana, all of this manual complexity is taken away. For scaling up, if you want to grow the analytics jobs over time, you can add nodes seamlessly. Ahana Cloud and other distributions can add the nodes to the cluster while your services are still up and running. But the part that isn’t seamless is when you stop the entire cluster. In addition to all the workers and the coordinator being provisioned, the configuration and the cluster connections to the data sources, and the Hive metastore are all maintained with Ahana Cloud. And so when you restart the cluster back up, all of that comes up pre-integrated with the click of a button: the nodes get provisioned again, and you have access to that same cluster to continue your analytics service. This is very important, because otherwise, you would have to manage it on your own, including the configuration management and reconfiguration of the catalog services. Specifically for EMR, for example, when you terminate a cluster, you lose track of that cluster altogether. You have to start from scratch and reintegrate the whole system.