Complex SQL Queries
Complex SQL queries benefit from Presto’s distributed, parallel, in-memory processing architecture and cost-based optimizer. And with Presto’s federation capabilities even more complex queries can be unleashed on multiple data sources in a single query.
To ensure optimum performance with complex queries Presto has features like dynamic filtering which can significantly improve the performance of queries with selective joins by avoiding reading data that would be filtered by join condition. And the collection of table statistics using ANALYZE tablename is highly recommended.
Complex SQL features and functions are used when advanced calculations are needed, or when many tables (perhaps from multiple sources) are to be joined, when dealing with nested or repeated data, dealing with time-series data or complex data types like maps, arrays, structs and JSON, or perhaps a combination of all these things.
Presto’s ANSI SQL engine supports numerous advanced functions which can be split into the following categories – links to the PrestoDB documentation are provided for convenience:
- Functions for JSON and other complex data types – https://prestodb.io/docs/current/functions/json.html
- Advanced aggregation techniques – https://prestodb.io/docs/current/functions/aggregate.html
- Window functions – https://prestodb.io/docs/current/functions/window.html
- Array and map functions – https://prestodb.io/docs/current/functions/array.html and https://prestodb.io/docs/current/functions/map.html
- Lambda expressions to assist with nested data – https://prestodb.io/blog/2020/03/02/presto-lambda
- Geospatial – https://prestodb.io/docs/current/functions/geospatial.html
- Sampling – https://prestodb.io/docs/current/sql/select.html “Table Sampling” section.