What is the difference between a managed table and external tables?
Advice: Looking for more information about understanding PrestoDB and its components? Grab a free download of the O’Reilly ebook, Learning and Operating Presto.
Managed Table & External Tables: Main Difference
There are a few differences between these. However, the main difference between a managed and external table is that when you drop an external table, the underlying data files stay intact. This is because the user is expected to independently manage the data files and directories. With a managed table, the underlying directories and data get wiped out when the table is dropped. See the additional differences between the two and learn when to use each below.
Managed Table & External Table: Within Presto
Regarding Presto tables are developed as such:
External Table: Table created using WITH has ‘external_location’
Managed Table: Table created in schema which has WITH used has ‘location’
You cannot “insert into” an external table (By default, the setting
hive.non-managed-table-writes-enabled=false prevents you from doing so).
The expectation that the data and information in the external table is managed externally. e.g. Spark, Hadoop, Python Scripts, or another external ETL process.
Internal Tables vs. External Tables in Apache Hive
Below are the major differences between Internal vs External tables in Apache Hive:
|INTERNAL OR MANAGED TABLE||EXTERNAL TABLE|
|By default, Hive creates an Internal or Managed Table.||Use EXTERNAL option/clause to create an external table|
|Hive owns the metadata, table data by managing the lifecycle of the table||Hive manages the table metadata but not the underlying file.|
|Dropping an Internal table drops metadata from Hive Metastore and files from HDFS||Dropping an external table drops just metadata from Metastore without touching the actual file on HDFS/S3|
|Metadata on Inserts, creation of new partitions, etc. are updated automatically during inserts through the metastore||You need to explicitly run sync_partitions to sync changes on S3 with the metastore|
There tends to be a sense of confusion regarding when to pick managed tables and when to decide to use external tables. However, you can rely on this rule of thumb for future planning.
In short, you should opt to use managed tables when the metastore should manage the lifecycle of the table. Also, when you are generating temporary tables. However, you should use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped.
If you want to get started with Presto easily, check out Ahana Cloud. It’s SaaS for Presto and takes away all the complexities of tuning, management, and more. It’s free to try out for 14 days, then it’s pay-as-you-go through the AWS marketplace. Give it a try and you’ll be up and running Presto in under an hour.
Want to take a deeper dive into understanding Presto. Learn what PrestoDB is, how it got started, and the benefits for Presto users.
AWS lake formation helps users to build, manage, and secure their data lakes in a very short amount of time, meaning days instead of months as is common with a traditional data lake approach. Learn more about AWS Lake Formation, including the pros and cons of AWS Lake Formation.
A data warehouse is a data repository, typically used for analytic systems and Business Intelligence tools. Take a look at this article to get a better understand of what it is, how it’s used, and the pros and cons of a data warehouse compared to a data lake.