Creating tables in a S3 Bucket gives “Query failed External location must be a directory”

Using Presto’s Hive connector you want to create a new table in a directory that already exists, in an s3 bucket that already exists that you have full read/write access to.  But Presto tells you the directory isn’t a directory! 

In this example you have already created your target directory myDir using the AWS s3 console, in a bucket called myBucket.  But attempts to create the customer table in that directory fail:

presto> CREATE TABLE hive.customer ( customer_id bigint, email varchar(64) ) 
WITH ( external_location = 's3a://myBucket/myDir/' , format = 'PARQUET' );

Query 20210104_110546_00015_mhsym failed: External location must be a directory

The issue here is not immediately obvious. The problem is that directories created using the S3 console cannot be used by Presto+Hive connector.

The workaround is:

1) first remove the problematic target directory (myDir in this example)

2) use the aws command line tool to create the new myDir target directory, using a cp (copy) command and any file – dummyfile in this example..  

3) Subsequent write attempts – like a CREATE TABLE – will then work:

--Make a new 'customer' target dir 
$ aws s3 cp dummyfile "s3://myBucket/myDir/dummyfile"

--Now CREATE TABLE works fine
presto> CREATE TABLE hive.customer ( customer_id bigint, email varchar(64) )
 WITH ( external_location = 's3a://myBucket/myDir/' , format = 'PARQUET' );
CREATE TABLE

Don’t forget to delete the dummy file.

The reason this happens is because S3 is not a regular file system, it is a key-value store, and directories are not really directories at all. 

More details about this issue and why it happens can be found at: https://github.com/prestodb/presto/issues/8939.
The above should help you resolve any further Query failed: External location must be a directory errors you encounter.