When should I use ORC versus Parquet when using Presto?
If you’re working with open data lakes using open source and open formats, you can have multiple formats. Presto works with both – ORC Presto and Parquet Presto. You’ll probably want to optimize for your workloads.
Both ORC and Parquet store data in columns. For Presto Parquet, it is most efficient when it comes to storage and performance. ORC on the other hand is ideal for storing compact data and skipping over irrelevant data without complex or manually maintained indices. For example, ORC is typically better suited for dimension tables which are slightly smaller while Parquet works better for the fact tables, which are much bigger.
If you’re looking to get up and running quickly with Presto, you can check out Ahana Cloud. It’s a SaaS for Presto and takes care of all the configuration, tuning, deployment, etc.