Benchmarking Warehouse Workloads on the Data Lake using Presto

TPC-H Benchmark Whitepaper

How to run a TPC-H Benchmark on Presto

Presto is an open source MPP Query engine designed from the ground up for high performance with linear scaling. Businesses looking to solve their analytics workload using Presto need to understand how to evaluate Presto performance and this document will help in the endeavor of benchmarking Presto. 

To help users who would like to benchmark Presto, we’ve written a technical guide on how to set up your Presto benchmark using benchto, an open source framework that provides an easy and manageable way to define, run and analyze macro benchmarks in clustered environment.

Running a benchmark on Presto can help you to identify things like: 

  • system resource requirements 
  • resource usage during various operations 
  • performance metrics for such operations
  • ..and more, depending on your workload and use case

This technical guide provides an overview on TPC-H, the industry standard for benchmarking, and explains how to configure and use the open-source Benchto tool to benchmark Presto. It also shows an example on comparing results between two different runs of an Ahana-managed Presto cluster with and without cache enabled.

We hope you find this useful! Happy benchmarking.