The Role of Blueprints in Lake Formation on AWS

Why does this matter?

There are 2 major steps to create a Data Lakehouse on AWS, first is to set up your S3-based Data Lake and second is to run analytical queries on your data lake. A popular SQL engine that you can use is Presto. This article is focused on the first step and how AWS Lake Formation Blueprints can make that easy and automated. Before you can run analytics to get insights, you need your data continuously pooling into your lake!

AWS Lake Formation helps with the time-consuming data wrangling involved with maintaining a Data Lake. It makes that simple and secure. In Lake Formation, there is the Workflows feature. Workflows encompasses a complex set of ETL jobs to load and update data. 

work flow diagram

What is a Blueprint?

A Lake Formation Blueprint allows you to easily stamp out and create workflows. This is an automation capability within Lake Formation. There are 3 types: Database snapshots, incremental database, and log file blueprints.

The database blueprints support automated data ingestion of sources like MySQL, PostgreSQL, SQL service to the Open Data Lake. It’s a point and click service with simple forms in the AWS console.

A Database snapshot does what it sounds like, it loads all the tables from a JDBC source to your lake. This is good when you want time stamped end-of-period snapshots to compare later.

An Incremental database also does what it sounds like, taking only the new data or the deltas into the data lake. This is faster and keeps the latest data in your data lake. The Incremental database blueprint uses bookmarks on columns for each successive incremental blueprint run. 

The Log file blueprint takes logs from various sources and loads them into the data lake. ELB logs, ALB logs, and Cloud Trail logs are an example of popular log files that can be loaded in bulk. 

Summary and how about Ahana Cloud?

Getting data into your data lake is easy, automated, and consistent with AWS Lake Formation. Once you have your data ingested, you can use a managed service like Ahana Cloud for Presto to enable fast queries on your data lake to derive important insights for your users. Ahana Cloud has integrations with AWS Lake Formation governance and security policies. See that page here: https://ahana.io/aws-lake-formation 

lake formation diagram