AWS Lake Formation vs AWS Glue – What are the differences?
As you start building your analytics stack in AWS, there are several AWS technologies to understand as you begin. In this article we’ll discuss two key technologies: AWS Lake Formation for security and governance and AWS Glue, a data catalog. For reference, AWS Lake Formation is built on AWS Glue, and both services share the same AWS Glue data catalog.
AWS Lake Formation
AWS Lake Formation makes it easier for you to build, secure, and manage data lakes.
AWS Lake Formation gives you a central console where you can discover data sources, set up transformation jobs to move data to an Amazon Simple Storage Service (S3) data lake, remove duplicates and match records, catalog data for access by analytic tools, configure data access and security policies, and audit and control access from AWS analytic and ML services
For AWS users who want to get governance on their data lake, AWS Lake Formation is a service that makes it easy to set up a secure data lake very quickly (in a matter of days), providing a governance layer for Amazon S3.
Lake Formation creates Glue workflows that integrates source tables, extract the data, and load it to Amazon S3 data lake
When to use AWS Lake Formation?
- Build data lakes quickly – this means days not months. You can move, store, update and catalog your data faster, plus automatically organize and optimize your data.
- Add Authorization on your Data Lake – You can centrally define and enforce security, governance, and auditing policies.
- Make data easy to discover and share – Catalog all of your company’s data assets and easily share datasets between consumers.
What is AWS Glue?
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and join data for analytics, machine learning, and application development. AWS Glue consists of a central metadata repository known as the AWS Glue Data Catalog which discovers and catalogs metadata about your data stores or data lake. Using the AWS Glue Data Catalog, users can easily find and access data.
When to use AWS Glue?
- Create a unified data catalog to find data across multiple data stores – View the Data Catalog to quickly search and discover the datasets that you own, and maintain the relevant metadata in one central repository.
- Data Catalog for data lake analytics with S3 – Organize, cleanse, validate, and format data for storage in a data warehouse or data lake
- Build ETL pipelines to ingest data into your S3 data lake.
The data workflows initiated from AWS Lake Formation blueprint are nothing but AWS Glue workflows. You can view and manage these workflows in either the Lake Formation console and the AWS Glue console.
AWS Lake Formation vs AWS Glue: A Summary
AWS Lake formation simplifies security and governance on the Data Lake whereas AWS Glue simplifies the metadata and data discovery for Data Lake Analytics.
Check out our community roundtable where we discuss how you can build simple data lake with the new stack: Presto + Apache Hudi + AWS Glue and S3 = The PHAS3 stack