When should I use ORC versus Parquet when using Presto?
If you’re working with open data lakes using open source and open formats, you can have multiple formats. Presto works with both. You’ll probably want to optimize for your workloads.
Both ORC and Parquet store data in columns. Parquet is most efficient when it comes to storage and performance while ORC is ideal for storing compact data and skipping over irrelevant data without complex or manually maintained indices. For example, ORC is typically better suited for dimension tables which are slightly smaller while Parquet works better for the fact tables, which are much bigger.