Trino, a distributed SQL query engine, is known for its ability to process large amounts of semi-structured data using familiar SQL semantics. However, there are situations where an alternative may be more suitable. In this article, we explore four Trino alternatives that offer better price/performance for specific use cases.
What is Trino?
Trino is a distributed SQL query engine that supports various data sources, including relational and non-relational sources, through its connector architecture. It is a hard fork of the original Presto project, which was started at Facebook and later open-sourced in 2013.
The creators of Presto, who later became cofounders/CTOs of Starburst, began the hard fork named Trino in early 2019.
Trino has since diverged from Presto, and many of the innovations that the community is driving in Presto are not available in Trino. Trino is not hosted under the Apache Software Foundation (ASF) or Linux Foundation, but rather under the Trino Software Foundation, a non-profit corporation controlled by the cofounders of Starburst.
When Should You Use Trino?
Presto-based services – including Trino and PrestoDB – are designed for ad-hoc querying and analytical processing over data lakes, and allow developers to run interactive analytics against massive amounts of semi-structured data. Standard ANSI SQL semantics are supported, including complex queries, joins, and aggregations.
Trino or Presto should be used when a user wants to perform fast queries against large amounts of data from different data sources using familiar SQL semantics. It is suitable for organizations that want to use their existing SQL skills to query data without having to learn new complex languages.
Other Trino use cases that have been mentioned in the context of data science workloads are for running a specific federated query that requires high performance and when you need to connect to data via Apache Hive as the backend.
When Should You Look at Alternatives to Trino?
While Trino is a powerful and popular framework, there are situations where you might want to consider an alternative. These include:
- If you’re looking for an open-source project that has a strong governance structure and charter, then Trino is not the best choice since it is a vendor-controlled non-profit corporation. Users who prefer to use a project hosted under a well-known project hosting organization like ASF or The Linux Foundation may choose to use another tool instead of Trino.
- If you are looking for services and support from vendors, you should compare the functionality and price/performance provided by Trino to alternative tools such as Ahana, Amazon Athena, or Dremio.
- If you’re looking for a database management system that stores and manages data, then Trino is not suitable. Similarly to Presto, Trino is a SQL query engine that queries the connected data stores and does not store data (although both tools have the option to write the results of a query back to object storage)
4 Alternatives to Trino
If you’re looking for an alternative to Trino, consider one of the following:
- Open Source PrestoDB
- Ahana, managed service for Presto on AWS
- Amazon Athena, serverless service for Presto/Trino on AWS
1. PrestoDB – the original Presto distribution used at Facebook
As mentioned above, Trino is originally a hard fork of PrestoDB. Trino was previously known as PrestoSQL before being rebranded in December 2020. The Presto Software Foundation was also rebranded as Trino Software Foundation to reflect the fact that these are two separate and divergent projects.
While Trino and PrestoDB share a common history, they have different development teams and codebases, and may have different features, optimizations, and bug fixes.
Some key differences between PrestoDB and Trino:
- PrestoDB is tested by and used by Facebook, Uber, Bytedance, and other internet-scale companies, while Trino is not.
- Presto is one of the fastest-growing open-source projects in the data analytics space.
- The Presto Foundation (part of The Linux Foundation) oversees PrestoDB, whereas Trino is mainly steered by a single company (Starburst).
- Presto offers access to recent and current innovations in PrestoDB including Project Aria, Project Presto Unlimited, additional user-defined functions, Presto-on-Spark, Disaggregated Coordinator, and RaptorX Project.
See the full comparison: Presto vs Trino.
There are several ways you can get started with open source Presto, including running it on-premises, through a Docker container, and more (check out our getting started with Presto page).
2. Ahana Cloud: managed service for Presto on AWS
Ahana, a member of the Presto Foundation and contributor to the PrestoDB project, offers a managed, cloud-native version of open-source Presto – Ahana Cloud. It gives you a managed service offering for Presto by taking care of the hundreds of configurations and tuning parameters under the hood while still giving you more control and flexibility as compared to a serverless offering.
Ahana also includes some features like Data Lake Caching for better performance and AWS Lake Formation integration to take advantage of granular data security.
Check out a demo of Ahana Cloud.
3. Amazon Athena: managed Presto/Trino service provisioned by AWS
Amazon Athena is a serverless, interactive query service that lets you analyze data stored in Amazon S3 using standard SQL. Originally based off of PrestoDB, Athena now incorporates features from both Presto and Trino.
In our comparison between Athena and Trino-based Starburst, we concluded that:
- Starburst and Amazon Athena are both query engines used to query data from object stores such as Amazon S3, but there are some key differences.
- Starburst has features like Cached Views and pushdown capabilities, while Athena is optimized for fast performance with Amazon S3 and executes queries automatically in parallel.
- Users generally regard both Starburst and Athena as having good performance, but note that Starburst may require more customization and technical expertise, and Athena may need more optimization and sometimes has concurrency issues.
- Users have found Starburst and Athena to be relatively easy to use, but have also mentioned some drawbacks related to complex customization, lack of features, and difficulty debugging.
- In terms of cost, Athena charges a flat price of $5 per terabyte of data scanned, while Starburst’s pricing is more complex.
4. Dremio: serverless query engine based on Apache Arrow
Dremio, which is built on Apache Arrow, is another query engine that enables high-performance analytics directly on data lake storage.
According to Dremio’s website, Dremio offers interactive analytics directly on the lake and is often used for BI dashboards, whereas Starburst primarily supports ad-hoc workloads only. Dremio provides self-service with a shared semantic layer for all users and tools, while Starburst lacks a semantic layer and data curation capabilities.
On the other hand, Starburst touts a cost-based optimizer that helps define an optimal plan based on the table statistics and other info it receives from plugins. Starburst’s custom connectors are optimized to be run in parallel, taking advantage of Trino’s MPP architecture.
While both platforms offer similar products, Dremio seems to be more focused on BI-oriented workloads reading from data lakes, whereas Starburst might be better suited for ad-hoc and federated queries.
Try Ahana Cloud’s managed Presto for free
If you’re evaluating SQL query engines, you’re in the right place. The easiest way to get started is with Ahana Cloud for Presto. You can try it for yourself, but we recommend scheduling a quick, no-strings-attached call with our solutions engineering team to understand your requirements and set up the environment. Get started now