ahana logo

Multi-level Data Lake Caching with RaptorX for Ahana Cloud

One-click caching built-in to every Presto cluster for up to 30X query performance improvements

Multi-level Data Lake caching, not just eliminated the need to read same data from data lakes like S3 but also reduces the query latency by eliminating frequent RPC calls at metastore, file metadata and scheduler’s file list level and additionally decreases CPU cycles by caching intermediate result sets at worker level

Ahana Cloud users can easily enable data lake caching with the click of a button when creating a new Presto cluster. The rest, including attaching SSDs and sizing the cache based on the user selected instance type and optimized scheduling is all done automatically.

Ready to speed up your Presto queries?
Sign up for a free trial of Ahana Cloud today 😀

Benefits

97%

latency reductions for mixed workloads

30x

Query performance

Get analytics on your data faster

You no longer have to read data from the data lake itself; data is cached in each Presto Worker node in the Ahana in-VPC compute plane.

Segment based Data IO Cache

It is the worker node collocated disk cache that stores the data reads (ORC,Parquet etc.) from S3 storage. The advantage is to improve performance by caching data close to compute to avoid fetching data across the network repeatedly.

Intermediate Result Set Cache

A cache that lets you cache partially computed results set on the worker’s local SSD drive. This is to prevent duplicated computation upon multiple queries which will improve your query performance and decrease CPU utilization.

Metastore and File List Cache

A Presto coordinator caches table metadata (schema, partition list, and partition info) and  file lists to avoid long calls to metastore and remote storage. This in-memory cache helps to reduce query latency.

File Metadata Cache

Caches open file descriptors and stripe/file footer information in worker memory. These pieces of data are most frequently accessed when reading files. This cache is not just useful for decreasing query latency but also to reduce CPU utilization.

Soft Affinity Scheduling

With affinity scheduling, Presto Coordinator schedules requests that process certain data/files to the same Presto worker node  to maximize the cache hits. Sending requests for the same data consistently to the same worker node means less remote calls to retrieve data.

Benchmarks

With RaptorX, multi-level caching, users see up to 97% latency reductions for concurrent workloads

Workload Information

  • 30-40K partitions 
  • 8.5 Billions rows
  • 150gb of data
Multi-level caching with raptorX

Ready to get started?
Sign up for a free trial of Ahana Cloud today 😀