Can I write back or update data in my Hadoop / Apache Hive cluster through Presto?

Using Presto with a Hadoop cluster for SQL analytics is pretty common especially in on premise deployments. 

XH3eNJyksIZgh paiVECqDQikAxUAcncx cC6FIyZEtBMLuXPzvfIp2qD21MAGC0Uyzf8fTPLkYqjVzvPjC4EI7JMNi1wD3umWdcQz7AFK Ptz8pIa3Qvsj24C4TjLkqTfbbjwBn
6wl4fYwoXx EzI0 PeQXr4wVN7uOBOyzpboxv7rJij7hrztdka4uBYyl hgTO 328iCWQFOFnsnckkGXdesV6C4dSPh9Yv stAHcg5EIEdXJhkgM7eI8OoSsLwhotIe1kDWpKNK

With Presto, you can read and query data from the Hadoop datanodes but you can also make changes to data in Hadoop HDFS. There are however some restrictions. 

All this is enabled via Presto’s Hive Connector. 

The first step is to create a catalog properties file and point to the Hive Metastore. 

You can also optionally configure some Hive Metastore properties for the Hive Connector. 

Create etc/catalog/hive.properties with the following contents to mount the hive-hadoop2 connector as the hive catalog, replacing example.net:9083 with the correct host and port for your Hive metastore Thrift service:

connector.name=hive-hadoop2

hive.metastore.uri=thrift://example.net:9083

For basic setups, Presto configures the HDFS client automatically and does not require any configuration files. 

Creating a new table in HDFS via Hive

Using Presto you can create new tables via the Hive Metastore. 

Examples

Create table: 

Create a new Hive table named page_views in the web schema that is stored using the PARQUET file format, partitioned by date and country. HIVE is the name of the connector – that is the name of the properties file. 

CREATE TABLE hive.web.page_views (

  view_time timestamp,

  user_id bigint,

  page_url varchar,

  ds date,

  country varchar

)

WITH (

  format = 'ORC',

  partitioned_by = ARRAY['ds', 'country']

)

Deleting data from Hive / Hadoop

With the Hive connector, you can delete data but this has to be at the granularity of entire partitions. 

Example: 

Drop a partition from the page_views table:

DELETE FROM hive.web.page_views

WHERE ds = DATE '2016-08-09'

  AND country = 'US'

Drop the external table request_logs. This only drops the metadata for the table. The referenced data directory is not deleted:

DROP TABLE hive.web.request_logs

Drop a schema:

DROP SCHEMA hive.web

Hive Connector Limitations

  • DELETE is only supported if the WHERE clause matches entire partitions.
  • UPDATE is not supported from Presto