Creating tables in a S3 Bucket gives “Query failed External location must be a directory”
Using Presto’s Hive connector you want to create a new table in a directory that already exists, in an s3 bucket that already exists that you have full read/write access to. But Presto tells you the directory isn’t a directory!
In this example you have already created your target directory myDir using the AWS s3 console, in a bucket called myBucket. But attempts to create the customer table in that directory fail:
presto> CREATE TABLE hive.customer ( customer_id bigint, email varchar(64) ) WITH ( external_location = 's3a://myBucket/myDir/' , format = 'PARQUET' ); Query 20210104_110546_00015_mhsym failed: External location must be a directory
The issue here is not immediately obvious. The problem is that directories created using the S3 console cannot be used by Presto+Hive connector.
The workaround is:
1) first remove the problematic target directory (myDir in this example)
2) use the aws command line tool to create the new myDir target directory, using a cp (copy) command and any file – dummyfile in this example..
3) Subsequent write attempts – like a CREATE TABLE – will then work:
--Make a new 'customer' target dir $ aws s3 cp dummyfile "s3://myBucket/myDir/dummyfile" --Now CREATE TABLE works fine presto> CREATE TABLE hive.customer ( customer_id bigint, email varchar(64) ) WITH ( external_location = 's3a://myBucket/myDir/' , format = 'PARQUET' ); CREATE TABLE
Don’t forget to delete the dummy file.
The reason this happens is because S3 is not a regular file system, it is a key-value store, and directories are not really directories at all.
More details about this issue and why it happens can be found at: https://github.com/prestodb/presto/issues/8939.
The above should help you resolve any further Query failed: External location must be a directory errors you encounter.