
Webinar On-Demand
How to Build and Query Secure S3 Data Lakes with Ahana Cloud and AWS Lake Formation
AWS Lake Formation is a service that allows data platform users to set up a secure data lake in days. Creating a data lake with Presto and AWS Lake Formation is as simple as defining data sources and what data access and security policies you want to apply.
In this webinar, we’ll share more on the recently announced AWS Lake Formation and Ahana integration. The AWS & Ahana product teams will cover:
- Quick overview of AWS Lake Formation & Ahana Cloud
- The details of the integration
- How data platform teams can seamlessly integrate Presto natively with AWS Glue, AWS Lake Formation and AWS S3 through a demo
Join AWS Solution Architect Gary Stafford and Ahana Principal Product Manager Wen Phan for this webinar where you’ll learn more about AWS Lake Formation from an AWS expert and get an insider look at how you can now build a secure S3 data lake with Presto and AWS Lake Formation.
Webinar Transcript
SPEAKERS
Ali | Ahana, Wen Phan, Ahana, Gary Stafford | AWS
Ali LeClerc | Ahana
All right I think we have folks joining, so thanks everyone for getting here bright and early, if you’re on the west coast, or if you’re on the East Coast your afternoon I guess we will get started here in just a few minutes.
Ali LeClerc | Ahana
I’ll play some music to get people in the right mindset to learn about Lake Formation and Ahana Cloud for Presto. Wen, do you want to share the title slide of your slide deck are you going to start with something else? Up to you.
Wen Phan | Ahana
I’ll bring it up in a second.
Ali LeClerc | Ahana
Alright folks, thanks for joining. We’re going to just wait a few more minutes until we get things kicked off here, just to let people join, so give us a few minutes enjoy the music.
Ali LeClerc | Ahana
Alright folks, so we’re just waiting a few more minutes letting people get logged in and join and we’ll get started here in just a few.
Ali LeClerc | Ahana
All right. We are three minutes past the hour. So let’s go ahead and get started. Welcome folks to today’s Ahana webinar “How to Build and Secure AWS S3 Data Lakes with Ahana Cloud and AWS Lake Formation.” I’m Ali LeClerc, and I will be moderating today’s webinar. So before we get started, just a few housekeeping items. One is this session is recorded. So afterwards, you’ll get a link to both the recording and the slides. No need to take copious amounts of notes, you will get both the slides and the recording. Second is we did have an AWS speaker Gary Stafford, who will be joining us, he unfortunately had something come up last minute, but he will be joining as soon as he can finish that up. So you will have an AWS expert join. If you do have questions, please save those. And he will be available to take them later on. Last, like I just mentioned, we are doing Q&A at the end. So there’s a Q&A box, you can just pop your questions into that Q&A box at the bottom of your control panel. And again, we have allotted a bunch of time at the end of this webinar to take those questions. So with that, I want to introduce our speaker Wen Phan. Wen is our principal product manager at Ahana, has been working extensively with the AWS Lake Formation team to build out this integration and is an expert in all things Ahana Cloud and AWS Lake Formation. Before I turn things over to him to get started, I want to share or launch a poll, just to get an idea of the audience that we have on the webinar today. How familiar are you with Presto, with data lakes, and with Lake Formation? So if you could take just a few seconds to fill that in, that would be super appreciated. And we can kind of get a sense of who we have on today’s webinar. Wen is going to kind of tailor things on the fly based on the results here. So looks like good. We have some results coming in. Wen can you see this? Or do I need to end it for you to see it? Can you see any of the results?
Wen Phan | Ahana
I cannot see. Okay, the results?
Ali LeClerc | Ahana
No worries. So I’m going to wait we have 41% – 50% participation. I’m going to wait a few more seconds here. And then I will end the poll and show it looks like just to kind of give real time familiarity with Presto, most people 75% very little data lakes, I think it’s more spread across the board. 38% very little 44% have played around 90% using them today. Familiar already with Lake formation. 50% says very little. So it looks like most folks are fairly new to these concepts. And that is great to know. So I’ll just wait maybe a few more seconds here. Looks like we have 64% participation. Going up a little, do 10 more seconds. Give people a minute and a half of this and then I will end the poll here. We’re getting closer, we’re inching up. All righty. Cool. I’m going to end the poll. I’m going to share the results. So everybody can kind of see the audience makeup here. Alrighty. Cool. So with that, Wen, I will turn things over to you.
Wen Phan | Ahana
Awesome. Thanks, Ali. Thanks, everyone for taking that poll that was very, very useful. Like Ali said, I’m a product manager here at Ahana. I’m really excited to be talking about Ahana Cloud and Lake Formation today. It’s been a project that I’ve been working on for several months now. And excited to have it released. So here’s the agenda, we’ll go through today. Pretty straightforward. We’ll start with some overview of AWS Lake Formation, what it is, then transition to what Ahana is, and then talk about the integration between Ahana Cloud for Presto and AWS Lake Formation. So let’s get into it, AWS Lake Formation. So this is actually an AWS slide. Like Ali mentioned, Gary had something come up, so I’ll go ahead and present it. The bottom line is everybody, and companies, want more value from their data. And what you see here on the screen are some of the trends that we’re seeing in terms of the data growing, coming from multiple sources, being very diverse. Images and tax. It’s being democratized more throughout the organization, and more workloads are using the data. So traditional BI workloads are still there. But you’ll see a lot more machine learning data science type workloads. The paradigm that is emerging to support, this proliferation of data with low-cost storage, as well as allowing for multiple applications to consume it is the data lake essentially.
Today, folks that are building and securing data lakes, it’s taking a while, and this is what AWS is seeing. This is the impetus of why they built AWS Lake Formation. There are three kind of high level components to Lake Formation. The first one is to just streamline the process and make building data lakes a lot faster. So try to compress what used to take months to today’s and providing tooling that can make it easier to move store, update and catalog data. The second piece is the security piece. This is actually the cornerstone of what we’ll be demonstrating and talking about today. But how do you go about, securing – once you have your data in your data lake, how do you go about securing it? Enforcing policies and authorization model? And although data lake is very centralized, sharing the data across the organization, is very important. So another tenant of AWS Lake Formation is to actually make it quite easy or easier to discover and share your data.
So that’s a high level of Lake Formation. Now, we’ll go into Ahana and kind of why we went and built this and worked with AWS at a early stage to integrate with Lake Formation. So first, for those of you who don’t know, Ahana is the Presto company. And I think there are a few of you who are very new to Presto. So this is a single slide essentially giving a high level overview of what Presto is. Presto is a distributed query engine. It’s not a database, it is a way for us to allow you to access different data sources using ANSI SQL and querying it. The benefit of this distributed query nature is you can scale up and as you need it for the for the data. So that’s really the second point. Presto offers very low latency, a performance that can scale to a lot of large amounts of data. The third piece is Presto was also created in a pluggable architecture for connectors. And what this really translates to, is it supports many data sources. And one prominent use case for Presto, in addition to low latency interactive querying is federated querying or querying across data sources.
The final high-level kind of takeaway for Presto, it is open source, it was originally developed at Meta, aka Facebook, and it’s currently under the auspices of the Linux Foundation. And at the bottom of this slide, here are typical use cases of why organizations go ahead and deploy Presto, given the properties that I’ve kind of mentioned above. Here is a architecture diagram of Presto, I just saw a question it’s MPP. To answer that question.
Ali LeClerc | Ahana
Can you repeat the question? So everybody knows what it was.
Wen Phan | Ahana
Yeah, the question is your architecture MPP or SMP? It’s MPP. And this is the way it’s kind of laid out kind of, again, very high level. So, the bottom layer, you have a bunch of sources. And you can see it’s very, very diverse. We have everything from NoSQL type databases to typical relational databases, things in the cloud, streaming, Hadoop. And so Presto is kind of this query layer between your storage, wherever your data is, be able to query it. And at the top layer other consumers of the query engine, whether it be a BI tool, a visualization tool, a notebook. Today, I’ll be using a very simple CLI to access Presto, use a Presto engine to query the data on the data lake across multiple sources and get your results back. So this all sounds amazing. So today, if you were to use Presto and try to stand up Presto yourself, you’re running to potentially run to some of the challenges. And basically, you know, maintaining, managing, spinning up a Presto environment can still be complex today. First of all, it is open source. But if you were to just get the open-source bits, you still have to do a lot of legwork to get the remaining infrastructure to actually start querying. So you still need a catalog. I know some of you are new to data lakes, essentially, you have essentially files in some kind of file store. Before it used to be distributed file systems like HDFS Hadoop, today, the predominant one is S3, which is an object store. So you have a bunch of files, but those files really don’t really mean anything in terms of a query until you have some kind of catalog. So if you were to use Presto, at least the open source version, you still have to go figure out – well, what catalog am I going to use to map those files into some kind of relational entity, mental model for them for you to query? The other one is Presto, has been actually around for quite a while, and it was born of the Hadoop era, it has a ton of configurations. And so if you were to kind of spin this up, you’d have to go figure out what those configurations need to be, going to have to figure out the settings, there’s a lot of complexity there, and, tied to the configuration, you wouldn’t know how to really tune it. What’s good out of the box, might have poor out of the box performance. So all of these challenges, in addition to the proliferation of data lakes, is why Ahana was born and the impetus for our product, which is Ahana Cloud for Presto.
We aim to get you from zero to Presto, in 30 minutes or less. It is a managed cloud service, I will be using it today, you will be able to see it in action. But as a managed cloud service, there is no installation or configuration. We specifically designed this for data teams of all experience levels. In fact, a lot of our customers don’t have huge engineering teams and just really need an easy way of managing this infrastructure and providing the Presto query engine for their data practitioners. Unlike other solutions, we take away most of the complexity, but we still give you enough knobs to tune things, we allow you to select the number of workers, the size of the workers that you want, things like that. And obviously, we have many Presto experts within the company to assist our customers. So that’s just a little bit about Ahana Cloud for Presto, if you want to try it, it’s pretty simple. Just go to our website at that address above, like Ali said, you’ll get this recording, and you can go ahead to that site, and then you can sign up. You will need an AWS account. But if you have one, we can go ahead and provision the infrastructure in your account. And you can get up and running with your first Presto cluster pretty quickly. And a pause here, see if there’s another question.
Ali LeClerc | Ahana
Looks like we have a few. What format is the RDBMS data stored in S3?
Wen Phan | Ahana
Yeah, so we just talked about data. I would say the de facto standard, today’s Parquet. You can do any kind of delimited format, CSV, ORC files, things like that. And that then just depends on your reader to go ahead and interpret those files. And again, you have to structure that directory layout with your catalog to properly map those files to a table. And then you’ll have another entity called the database on top of the table. You’ll see some of that, well. I won’t go to that low level, but you’ll see the databases and tables when I show AWS Lake Formation integration.
Ali LeClerc | Ahana
Great. And then, I just want to take a second, Gary actually was able to join. So welcome Gary. Gary is Solutions Architect at AWS and, obviously, knows a lot about Lake Formation. Great to have you on Gary. And he’s available for questions, if anybody has specific Lake Formation questions, so carry on Wen.
Wen Phan | Ahana
Hey, Gary, thanks for joining. Okay, so try to really keep it tight. So, just quickly about Lake Formation, since many of you are new to it. And again, there are three pieces – making it easier to stand up the data lake, the security part, and the third part being the sharing. What we’re focused on primarily, with our integration, and you’ll see this, is the security part. How do we use Lake Formation as a centralized source of authorization information, essentially. So what are the benefits? Why did we build this integration? And what is the benefit? So first of all, many folks we’re seeing have invested in AWS as their data lake infrastructure of choice. S3 is huge. And a lot of folks are already using Glue today. Lake Formation leverages both Glue and AWS. So it’s, it’s a, it was a very natural decision for us seeing this particular trend. And so for folks that have already invested put into S3, and Glue, this is a basic native integration for you guys. So this is a picture of how it works. But essentially, you have your files stored in your data lake storage – parquet, CSV, or RC – the data catalog is mapping that to databases and tables, all of that good stuff. And then the thing that we’re going to be applying is Lake Formations access control. So you have these databases, you have these tables. And what we’ll see is can you control it can you control access to which user has access to which table? Actually will be able to see which users have access to which columns and which rows. And so that’s basically, the integration that we’ve built in. So someone – the data lake admin – will go ahead and not only define the schemas but define the access and Ahana for Presto will be able to take advantage of those policies that have been centrally defined.
We make this very easy to use, this is a core principle in our product as well, as I kind of alluded to at the beginning. We’re trying to really reduce complexity and make things easy to use and really democratize this capability. So doing this is very few clicks, and through a very simple UI. So today, if you were going to Ahana, and I’m going to show this with the live the live application, if we show you the screens. Essentially, it’s an extension of Glue, so you would have Glue, we have a single click called “Enabled AWS Lake Formation.” When you go ahead and click that, we make it very easy, we actually provide a CloudFormation template, or stack, that you can run that will go ahead and hook up Ahana, your Ahana Cloud for Presto, to your Lake Formation. And that’s it. The second thing that we do is you’ll notice that we have a bunch of users here. So, you have all these users. And then you can map them to essentially your IAM role, which are what the policies are tied to in Lake Formation. So, in Lake Formation, you’re going to create policies based on these roles. You can say, for example, the data admin can see everything, the HR analyst can only see tables in the HR database, whatever. But you have these users that then will be mapped to these roles. And once we know what that mapping is, when you log into presto, as these users, the policies tied to those roles are enforced in your queries. And I will show this. But again, the point here is we make it easy, right? There’s a simple user interface for you to go ahead and make the mapping. There’s a simple user interface where then for you to go ahead and enable the integration.
Wen Phan | Ahana
Are we FedRAMP certified in AWS? At this moment, we are we are not. That is inbound requests that we have had, and that we are exploring, depending on, I think, the need. Today, we are not FedRAMP certified. Then the final piece is the fine-grained access control. So, leveraging Lake Formation. I mentioned this, you’re going to build your data lake, you’re going to have databases, you’re going to have tables. And you know, AWS Lake Formation has had database level security and table level security for quite some time we offer that. More recently, they’ve added more fine-grained access control. So not only can you control the database and the table you have access to, but also the columns and the specific roles you have access to. The role level one being just announced, a little over a month ago at the most recent re:Invent. We’re actually one of the earliest partners to go ahead and integrate with this feature that essentially just went GA. I’ll show this. Okay, so that was a lot of talking. I’m going to do a quick time check, we’re good on time. I’m going to pause here. Let me go see before I go into the demo, let me see what we have for other questions. Okay, great. I answered the FedRAMP one.
Ali LeClerc | Ahana
Here’s one that came in – Can Presto integrate with AzureAD AWS SSO / AWS SSO for user management and SSO?
Wen Phan | Ahana
Okay, so the specific AD question, I don’t know the answer to that. This is probably a two to level question. So, there’s, there’s Presto. Just native Presto that you get out of the box and how you can authenticate to that. And then there is the Ahana managed service. What I can say is single sign-on has been a request and we are working on providing more single sign on capabilities through our managed service. For the open-source Presto itself, I am not aware of any direct capability to AzureAD kind of integration there. If you are interested, I can definitely follow up with a more thorough answer. I think we have who asked that, if you actually are interested, feel free to email us and we can reach out to you.
Ali LeClerc | Ahana
Thanks, Wen.
Wen Phan | Ahana
Okay, so we’re going to do the demo. Before I get to the nitty gritty of demo. Let me give you some kind of overview and texture. So let me just orient you, everyone, to the application first, let’s go ahead and do that. So many of you are new to go move this new to Ahana. Once you have Ahana installed, this is what the UI looks like. It’s pretty simple, right? You can go ahead and create a cluster, you can name your cluster, whatever, [example] Acme. We have a few settings, and how large you want your instances, what kind of auto scaling you want. Like we mentioned out of the box, if you need a catalog, we can provide a catalog for you. You can create users so that users can log into this cluster, we have available ones here, you can always create a new one. Step one is create your cluster. And then we’ve separated the notion of a cluster from a data source. That way, you can have multiple clusters and reuse configuration that you have with your data source. For example, if you go to a data source, I could go ahead and create a glue data source. And as you select different data sources, you provide the configuration information specific to that data source. In my case, I’ll do a Lake Formation one. So, I’m going to Lake formation, you’ll select what region your Lake Formation services in. You can use Vanilla Glue as well, you don’t have to use Lake Formation, if you don’t want to use the fine-grained access control. If you want to, and you want to use your policies, you enable Lake Formation, and then you go ahead and run the CloudFormation script stack. And they’ll go ahead and do the integration for you. If you want to do it yourself, or you’re very capable, we do provide information about that in our documentation. So again, we try to make things easy, but we also try to be very transparent. If you want more control on you on your own. But that’s it. And then you can map the roles, as I mentioned before, and then you go ahead and add the data source. And it will go ahead and create the data source. In the interest of time, I’ve already done this.
You can see I have a bunch of data sources, I have some Netflix data on Postgres, it’s not really, real data, it’s just, it’s what we call it. We have another data source for MySQL, I have Vanilla Glue, and I have a Lake Formation one. I have a single cluster right now that’s been idle for some time for two hours called “Analysts.” Once it’s up, you can see by default has three workers. It’s scaled down to one, not a really big deal, because these queries I’m going to run aren’t going to be very, very large. This is actually a time saving feature. But once it’s up, you can connect it you’ll have an endpoint. And whatever tool you want, can connect via JDBC, or the endpoint, we have Superset built in. I’m going to go ahead and use a CLI. But that was just a high-level overview of the product, since folks probably are new to it. But pretty simple. The final thing is you can create your users. And you can see how many clusters your users are attached to. All right, so let’s go back to slideware for a minute and set the stage for what you’re going to see. We’re going to query some data, and we’re going to see the policies in Lake Formation in action.
I’ve set up some data so we can have a scenario that we can kind of follow along and see the various capabilities, the various fine grained access control in Lake Formation. So imagine we work for a company, we have different departments, sales department and HR department. And so let’s say the sales department has their own database. And inside there, they have transactions data, about sales transactions, you have information on the customers, and we have another database for human resources or HR to have employees. So here’s a here’s a sample of what the transaction data could look like. You have your timestamp, you have some customer ID, you have credit card number, you have perhaps the category by which that transaction was meant and you have whatever the amount for that transaction was. Customer data, you have the customer ID, which is just a primary key the ID – first name last name, gender, date of birth, where they live – again fictitious data, but will represent kind of representative of maybe some use cases that you’ll run into. And then HR, instead of customers, pretend you have another table with just your employees. Okay? All right. So let’s say I am the admin, and my name is Annie. And I want to log in, and I’m an admin. I should have access to everything, let’s go ahead and try this. So again, my my cluster is already up, I have the endpoints.
Wen Phan | Ahana
I’m going to log in as Annie. And let’s take a look at what we see. Presto has different terminology. And it might seem a little confusing. And I’ll go ahead and decode it for everyone, for those of you that are not familiar. Each connector essentially becomes what is called a catalog. Now, this is very different than a data catalog that we talk about. It’s just what they call it. In my case, the Lake Formation data source, that I created, is called LF for Lake Formation. I also called it LF, because I didn’t want to type as much, just to tie this back to what you are seeing. If we go back to here, you notice that the data source is called LF, and I’ve attached it to this cluster that I created, this analyst cluster that I created and attached it. So that’s why you see the catalog name as LF. So that’s great. And LF is attached to Lake Formation, which is has native integration to Glue and S3. If I actually look at what databases, they’re called schemas in Presto, I have in LF, I should see the databases that I just showed you. So, you see them and you see, ignore the information schema that’s just kind of metadata information, you see sales, and you see HR. And I can actually take a look at what tables I have in the “sales database,” and I have customers and transactions. And you know, I’m an admin. So I should be able to see everything in the transactions table, for example. And I’ve set this policy already in Lake Formation. So I go here, and I should see all the tables, the same data that I showed you in the PowerPoint. So you see the transaction, the customer ID, the credit card number category, etc. So great, I’m an admin, I can do stuff.
Let me see some questions. What do you type to invoke Presto? Okay, so let’s, let’s be very specific for this question. So Presto is already up, right. So I’ve already provisioned this cluster through Ahana. So when I went and said, create cluster, this spun up all the nodes of the Presto cluster set up, configured it did the coordinator, all of that behind the scenes, that’s Presto. It’s a cluster, it’s a query engine distributed cluster. Then you Presto exposes endpoints, [inaudible] endpoint, a JDBC endpoint, that then you can have a client attached to them. Okay, you can have multiple clients. Most BI tools will be able to access this.
In this case, for the simplicity of this demo, I just use a CLI. So I would basically download the CLI, which is just another Java utility. So you need to have Java installed. And then you run the CLI with some parameters. So I have the CLI, it’s actually called Presto, that’s what the binary is, then I pass it some parameters. And I said, Okay, what’s the server? Here’s the endpoint. So it’s actually connecting from my local desktop to that cluster in AWS with this, but you can’t just access it, you need to provide some credentials. So I’m saying I’m going to authenticate myself with a password.
The user I want to access that cluster with is “Annie,” why is this valid? Well, this is valid, because when I created this cluster, When I created this cluster, I specified which users are available in that cluster. So I have Annie, Harry, I have Olivia, I have Oscar, I have Sally, I have Wally. Okay, so to again, just to summarize, I didn’t invoke Presto, from my desktop, my local machine, I’m just using a client, in this case, the Presto CLI to connect to than a cluster that I provisioned via Ahana Cloud – in the cloud. And I’m just accessing that. As part of that, that cluster is already configured to use Lake Formation. The first thing I did was log log in as Annie, and as we mentioned, Annie is my admin. And as an admin, she can access everything she has access to all the databases, all the tables, etc.
Wen Phan | Ahana
Okay, so let’s do a much more another interesting case. And let’s say instead of Annie, I log in as Sally, who is a sales analyst. As a platform owner, I know that Sally in order to do her job, all she needs to look at are transactions. Because let’s say she’s going to forecast what the sales are, or she’s going to do some analysis on what type of transactions have been good. So if we go back and look at the transactions table, this is what it looks like. Now, when I do this, though, I do notice that there’s credit card number, and I know that I don’t really want to expose a credit card number to my analysts, because they don’t need it for their work. So I’m going to go ahead – and also in this policy, for financial information – say, you know, any sales analysts, in this case, Sally, can only have access to the transactions table. And when she accesses the transactions table, she will not be able to see the credit card number. Okay. So let’s go see what this looks like. So instead of Annie, I’m going to log in as Sally. Let’s go ahead and just see what we did here. If we actually look at the data source, Annie got mapped to the role of data admin, so she can see everything. Sally is mapped to the role of “sales analysts,” and therefore can only do what a sales analyst is defined to do in Lake Formation. But the magic is it’s defined in Lake Formation. But Ahana Cloud for Presto can take advantage of that policy.
So I’m going to go ahead and log into Sally. Let’s first take a look at the databases that I can see, they’re called schemas in LF. So first thing you’ll notice is Sally does not see HR, because she doesn’t need to, and she has been restricted, so she can only see sales. Now let’s see what tables let’s see what tables Sally can see. Sally can only see transactions, Sally cannot actually see the customers table. But she doesn’t know this. She’s just running queries. And she’s saying, “Well, this is all I can see. And it’s what I need to do my job. So I’m okay with it.” So let’s actually try to query the transactions table now. So sales, LF sales, transactions. When I tried to do this, I actually get an Access Denied. Why? The reason I get an access denied here is because I cannot actually look at all the columns, I’ve been restricted, there’s only a subset of the columns that I can look at. As I mentioned, we are not able to see the credit card number. So when I tried to do a select star, I can’t really do a star because I can’t see the credit card number, we are making this an improvement where we won’t do an explicit deny. And we’ll just return the columns that you have access to. Otherwise, this can be a little annoying. But the end of the day, you can see the authorization being enforced. You have Presto, and it’s and the policies are being enforced, that are set in Lake Formation.
So now instead of doing a star and actually specifically, paste the columns I have access to – I can see it and I can do whatever I need to do. I can do a group by to see what categories are great. I can do a time series analysis on revenue, I’d get in and then do a forecast for the next three months, whatever I need to do as sales analyst. So that’s great. Okay, so I’m going to go ahead and log out. So let’s go back to this. So we know Sally’s world. So now let’s say you know, the marketing manager Ali here has to marketing analyst and there she’s got them responsible for different regions, and we want to get some demographics on our customers. So we have Wally. And if you look at the customer’s data, there’s a bunch of PII – first name, last name, date of birth. So couple of things, we can automatically say, You know what, they don’t need to see this PII, we’re going to go ahead and mask it with Lake Formation. Okay, and like I mentioned, you know, Ali’s kind of segments of her analysts to have different regions across the Pacific West Coast. So Wally is really responsible for only Washington. So we decided to say hey, on a need to know basis, you’re only really going to get rows back that are from customers that live in Washington. Alright, so let’s go ahead and do that.
Wen Phan | Ahana
I’m going to log in as Wally, and let’s go actually see the databases again, just to justice to see it and I’m just showing you different layers of the authorization. So Wally can see skills, not HR, well, let’s see what tables while he can see. So while he should only see customers – Wally can only see customers, he cannot see the transactions, because he’s been restricted to it. Let’s try the same thing again, select star from sales customers. And we expect this to air out why? Because again, PII data, we, we cannot do a star. We don’t allow first name, last name, date of birth, all of that, if I do this, and I go ahead and take the customers out. I’ll see the columns that I want, and I only see the rows that come from Washington, I technically did not have to select the state, I just want to prove that I’m only getting records from Washington.
Let’s try another analyst Olivia. And guess what, Olivia is responsible only for Oregon. So she’s basically up here to Wally. But she’s responsible for Oregon. So I’m going to go ahead and do the same query, which is saved and see what happens. So in this case, Olivia can only see Oregon. What you’re seeing here is basically the fine-grained access control, you’re seeing database restriction, you’re seeing table level restriction, you’re seeing column number restriction, and you’re seeing role level restriction. And you can do as many of these as, as you want. So we talked about Wally, and we know Olivia can only see Oregon, one more persona, actually two more personas, and then we’re done. I think you guys all get the point.
I think I’ve probably done enough sufficient proof that we can in fact enforce policies. So last one is just Harry who’s in HR. So if I actually log in as Harry. Harry should only be able to see the HR data set. So if I go Harry. And I show the tables. Well, first of all, let’s just again, just to be complete, I’m only going to say HR, I want to see the sales data. So you can it’s hairy one see the transactions he couldn’t. And then I can go ahead. And since I already know what the schema is, look at all the employees in this database. And I’ll see everything because I’m in HR, so I can see your personal information. And it doesn’t restrict me.
Okay. And the final thing is, what happens if I have a user that I haven’t mapped any policies to? So I actually have one user here, who is Oscar, and I actually didn’t give Oscar any policies whatsoever. So let me go ahead here. So notice that Oscar is in the cluster. But Oscar is not mapped to any role whatsoever. I go back to my cluster, I go here. Oscar, he is here. So Oscar is a valid user in the cluster. But Oscar has no role. And so by default, if you have no role we deny you access. That’s essentially what’s going to happen. But just to prove it. Oscar is using this cluster, show catalogs, you’ll see the LF? Well, let’s say I try to I try to see what’s in LF, what’s in that connector, Access Denied. Because there is no mapping, you can’t see anything, we’re not going to tell you anything. We’re going to tell you what databases are in there. No tables, nothing. So that’s the case where you know, it’s very secure, you don’t have explicit access, you don’t you don’t get any information. Okay, so I’ve been in demo mode for a while, just wanted to check if there’s any questions or chat. All right, none.
So. So let’s just do a summary of what we saw. And then kind of wrap it up for Q&A. We’re good on time, actually. And give you some information of where you can get more information if you want to, you want to dig in, deep.
So first, the review. So we had all these users, you see the roles, we saw a case where you have all access, you saw the case where you have no access. And I did a bunch of other demos where you saw different varying degrees of access, table, database, column role, all of that stuff. And so that’s what that’s what this really integration brings to folks that have a data lake today. You’ve gotten all your data there. Inside your data lake, you’ve decided that Presto is the way to go in terms of interactive querying, because it scales, it can manage all your data. And now you want a role that’s all your analysts or your data practitioners, but you want to do it in a secure way. And you want to enforce it and you want to do it in one place. Lake Formation doesn’t only integrate with Ahana it can integrate with other tools, within the AWS ecosystem. Sure, defining these policies in one place, and Ahana managed Presto clusters can take advantage of that.
There was a more A technical talk on this, if you’re interested in some of the technical details that we just presented at Presto Con, with my colleague, Jalpreet, who is the main engineer on this, as well as another representative from AWS, Roy. If you’re interested, go ahead and just Google this and go to YouTube. And you can go watch this. And they’ll give you more of the nitty gritty underneath the hood, if you’re interested in that. And that is all I have for plans, content.
Ali LeClerc | Ahana
Wen what a fantastic demo, thanks for going through all of those. Fantastic. So I wanted to give Gary kind of a chance to share his perspective on the integration and his thoughts on you know what this kind of means, from the AWS point of view. So Gary, if you don’t mind putting on your video, that would be awesome. If you can just say hi to everyone and let you kind of share your thoughts.
Gary Stafford | AWS
That’s much better than that corporate picture that was up there. Yeah, thank you. And I would also recommend as Wen said to view the PrestoCon video with Roy and Jalpreet. I think they go into a lot a lot of detail in respect to how the integration works under the covers. And also, maybe share two links Ali, I’ll paste them in there. One link, kind of what’s new with AWS Lake formation, Roy mentioned some of the new features that were announced, I’ll drop a link in there to let folks know what’s new, it’s a very actively developed project, there’s a lot of new features coming out. So I’ll share that link. And also, Jalpreet mentioned a lot of the API features. Lake Formation has a number of API’s, I’ll drop a link in there too, that discusses some of those available endpoints and API’s a little better. I’ll just I’ll share my personal vision. And I think of services like Amazon Event Bridge that has a partner integration, which makes it very easy for SaaS partners to integrate with customers on AWS platform, I think it’d be phenomenal at some point if Lake Formation progresses to that standpoint with some of the features that that Roy mentioned and Wen demonstrated today. Where partners like Ahana could integrate with Lake Formation, and get an out of the box data lake, a way to create a data lake, a way to secure a data lake and simply add their analytics engine with their special sauce on top of that, and not have to do that heavy lifting. And I hope that’s the direction that Lake Formation is headed in. I think that’ll be phenomenal to have a better integration story with our partners on AWS.
Ali LeClerc | Ahana
Fantastic. Thanks, Gary. With that, we have a few questions that have come in. Again, if you do have a question, you can pop it into the Q&A box, or even the chat box. So Wen, I think this one’s for you, can you share a little bit more about the details on what happens with the enabling of the integration?
Wen Phan, Ahana
Sure, I will answer this question in two ways. I will show you what we’re doing under the hood. So that you know, and kind of this API exchange. And this is a recent release. So let me go ahead and share my screen again. I think and whoever asked the question, if I didn’t answer the question, let me know. So when you go to the data source, like I mentioned, it’s pretty simple. And we make we do that on purpose. So when you enable Lake Formation, you can go ahead and launch this CloudFormation template, which will go ahead and do the integration. What does it actually doing under the hood? So first of all, this is actually a good time for me to introduce our documentation. If you go to ahana.io, all of this is documented. So you go to docs, Lake Formation is tightly coupled with Glue, go to manage data sources, you go to Glue, this will tell you walk you through it. And there’s a section here, that tells you if you didn’t want to use this, like you didn’t want to actually use the CloudFormation. Or you just simply want it to understand what is this really doing, you can go ahead and read about it. The essentially, like Roy mentioned, there’s a bunch of API’s, one of the API’s is this data lake settings API with Lake Formation. If you use the AWS CLI, you can actually see this, and you’ll get a response, what we’re doing is there’s a bunch of flags that you need to set, you have to allow Ahana Presto to actually do the filtering on your behalf. So we’re going to get the data, we’re going to look at the policies and block out anything you’re not supposed to see. And we also are a partner. So the service needs to know that this is a valid partner that is interacting with the Lake Formation service. So that’s all this is doing. You could do this all manually if you really wanted to with the CLI. We just take care of this for you, on your behalf. So that’s what’s going on to enable the integration. The second part, and again, this goes into a lot more detail in this talk is what’s actually happening under the hood. I’m just going to show a quick kind of slide for this. But essentially what’s happening is when you make a query, so you defined everything in AWS when you make a query, our service so in our case, we’re a third party application, we go ahead and talk to Lake Formation, you set this up, we go talk for Lake Formation, we get temporary credentials. And then we know what the policies are. And we are able to access only the data that you’re allowed to see. And then we process it with a query. And then you see kind of in the client, in my case, that’s what you saw in the CLI.
Ali LeClerc | Ahana
Cool, thanks Wen, thorough answer. Next question that came in is this is this product, a competitor to Redshift? I’m assuming when you say product, do you mean Ahana? But maybe you can talk about both Ahana and Presto Wen?
Wen Phan | Ahana
Yeah, I mean, it all comes down to your use case. So Redshift is kind of more like a data warehouse. And that’s great. It has its own use cases. And again, Presto can connect to Redshift. So it depends on what you want. I mean, Presto can talk to data lake. So if you have use cases that make more sense on a data lake – Presto, is one way to access it. And actually, if you have use cases that need to span both the data lake and Redshift, Presto can federate that query as well. So it’s just another piece in the ecosystem. I don’t necessarily think it’s a competitor, I think it’s, as with many things, what’s your what’s your use-case and pick the right tool for your use-case.
Ali LeClerc | Ahana
Great. I think you just mentioned something around Glue, Wen, So somebody asked, do I need to use Glue from my catalog? If I’m using Lake Formation with Ahana Cloud?
Wen Phan | Ahana
Yes, you do. Yes, you do. It’s a tightly coupled AWS stack, which works very well. And so you do have to use Glue.
Ali LeClerc | Ahana
All right. So I think we’ve answered a ton of questions along the way, as well as just now. If there are no more, and it looks like no more have come in, then I think we can probably wrap up here. So any last kind of parting thoughts Wen and Gary before we say goodbye to everybody? So on that note, I’m going to post our, our link in here. I don’t know if Wen mentioned, maybe he did, we have a 14-day free trial. So no commitment, you can check out Ahana Cloud for Presto on AWS free for 14 days, play around with it, get started Lake Formation. If you’re interested in learning more, we’ll make sure to put you in touch with Wen who again is the local expert at that at Ahana. And then Gary, of course, is always able to help as well. And so, so feel free to check out our 14-day free trial. And with that, I think that’s it. All right, everyone. Thanks Wen fantastic demo, fantastic presentation. Appreciate it. Gary, thanks for being available. Appreciate all of your support and getting this integration off the ground and into the hands of our customers. So fantastic. Thanks, everybody for joining for sticking through with us till the end. You’ll get a link to the recording and the slides and we’ll see you next time.
Speakers
Gary Stafford
Solutions Architect, AWS

Wen Phan
Principal Product Manager, Ahana
