Why am I getting zero records when I use AWS Athena to query a CSV file?
There’s a common error many AWS Athena users see when they query CSV files – they will get zero records back. This tends to happen when they run the AWS Glue crawler and create a new table on the CSV file. There are several reasons why the query might return no records.
- The crawler is pointing to a file instead of an AWS S3 bucket
- The LOCATION path is wrong or there’s a double slash in it
- The partitions haven’t loaded into the Glue Data Catalog or the internal Athena data catalog
- In general, CSV crawlers can be sensitive to different issues (i.e. embedded newlines, partially quoted files, blanks in integer fields), so make sure everything is accurate
On the first point, if you have selected a file instead of your S3 bucket, the crawler will succeed but you won’t be able to query the contents which is why you’ll see the ‘Zero Records Returned’ error message. If there are other files that you don’t want crawled, you’ll need to create a new folder and move your CSV to that new folder, and then update the include path accordingly (and you’ll need to re-crawl it).
In order to get your S3 data lake to work, you’ll need to make sure each batch of same-schema files has its own top level directory.
If you’re running into issues like this with AWS Athena and want to offload the management while still getting the power of Presto, check out Ahana Cloud. It’s a fully managed service for AWS that removes these types of complexities and makes it really easy to run Presto in the cloud.
Check out the case study from ad tech company Carbon on why they moved from AWS Athena to Ahana Cloud for better query performance and more control over their deployment.