PrestoDB Questions & Answers
Answers to your questions about PrestoDB
What is Amazon Redshift Used For? Introduction Amazon Redshift is one of the most widely-used services in the AWS ecosystem and is a familiar component in many cloud architectures. In … Continue reading What Is AWS Redshift Used For | Redshift Use Cases
ETL vs ELT in Data Warehousing Introduction ETL, or Extract Transform Load, is when an ETL tool or series of homegrown programs extracts data from a data source(s), often a … Continue reading Differences Between ETL and ELT in Data Warehousing | Ahana
Understanding AWS Athena Costs with Examples What Is Amazon Athena? Since you’re reading this to understand Athena costs, you likely already know, so we’ll just very briefly touch on what … Continue reading How Much Does Amazon Athena Cost? | Ahana
5 Components of Data Warehouse Architecture | Ahana The Data Warehouse has been around for decades. Born in the 1980s, it addressed the need for optimized analytics on data. As … Continue reading 5 Components of Data Warehouse Architecture | Ahana
AWS Lake Formation is a service that makes it easy to set up a secure data lake very quickly (in a matter of days), providing a governance layer for data lakes on AWS S3.
Here, we are going to talk about AWS Athena vs Glue, which is an interesting pairing as they are both complementary and competitive. So, what are they exactly?
Querying Parquet Files using AWS Amazon Athena Parquet is one of the latest file formats with many advantages over some of the more commonly used formats like CSV and JSON. … Continue reading How to Query Parquet Files using Amazon Athena | Ahana
This article is focused on the first step and how AWS Lake Formation Blueprints can make that easy and automated. Before you can run analytics to get insights, you need your data continuously pooling into your lake!
While the thrust of this article is an AWS Redshift Spectrum vs Athena comparison, there can be some confusion with the difference between AWS Redshift Spectrum and AWS Redshift. Very briefly, Redshift is the storage layer/data warehouse, and Redshift Spectrum is an extension to Redshift that is a query engine.
AWS Lake Formation vs AWS Glue – What are the differences? As you start building your analytics stack in AWS, there are several AWS technologies to understand as you begin. … Continue reading Difference Between AWS Lake Formation vs AWS Glue
Amazon S3 Select Limitations What is Amazon S3 Select? Amazon S3 Select allows you to use simple structured query language (SQL) statements to filter the contents of an Amazon S3 … Continue reading Limitations of Amazon S3 Select | AWS Select Capabilities
Querying Amazon S3 Data Using AWS Athena The data lake is becoming increasingly popular for more than just data storage. Now we see much more flexibility with what you can … Continue reading How To Query Data in Amazon S3 Using Athena | Ahana
What is AWS Lake Formation? For AWS users who want to get governance on their data lake, AWS Lake Formation is a service that makes it easy to set up … Continue reading What is AWS Lake Formation? | Amazon S3 Lake formation
How does Presto Work With LDAP? What is LDAP? The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry standard application protocol used for directory services authentication. In LDAP … Continue reading How Presto Works with LDAP | Presto LDAP Authentication
What is Apache Ranger? Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the data platform. It is an open-source authorization solution that provides access … Continue reading What is Apache Ranger | Apache Ranger in Hadoop | Ahana
The term Data Lakehouse has become very popular over the last year or so, especially as more customers are migrating their workloads to the cloud. This article will help to … Continue reading What is a Data Lakehouse Architecture?
Presto offers several classes of mathematical functions that operate on single values and mathematical operators that allow for operations on values across columns. In addition, aggregate functions can operator on … Continue reading How to use mathematical functions and operators and aggregate functions for Presto?
To find the difference in time between consecutive dates in a result set, Presto offers window functions. Take the example table below which contains sample data of users who watched … Continue reading How do I get the date_diff from previous rows?
The Presto approx_percentile is one of the approximate aggregate functions, and it returns an approximate percentile for a set of values (e.g. column). In this short article, we will explain … Continue reading How do I use the approx_percentile function in Presto?
Using Presto with a Hadoop cluster for SQL analytics is pretty common especially in on premise deployments. With Presto, you can read and query data from the Hadoop datanodes but … Continue reading Can I write back or update data in my Hadoop / Apache Hive cluster through Presto?
Many times the Unix Epoch Time gets stored in the database. But this is not very human readable and conversion is required for reports and dashboards. Example of Unix Epoch … Continue reading How do I convert Unix Epoch time to a date or something more human readable with SQL?
Hadoop is a system that manages both compute and data together. Hadoop cluster nodes have the HDFS file system and may also have different types of engines like Apache Hive, … Continue reading How do I transfer data from a Hadoop / Hive cluster to a Presto cluster?
Presto provides an overloaded substring function to extract characters from a string. We will use the string “Presto String Operations” to demonstrate the use of this function. Extract last 7 … Continue reading Presto substring operations: How do I get the X characters from a string of a known length?
What is Spark SQL? Spark is a general purpose computation engine for large-scale data processing. At Spark’s inception, the primary abstraction was a resilient distributed dataset (RDD), an immutable distributed … Continue reading Spark SQL | What is Spark SQL & Spark SQL Guide | Ahana
How do I query a data lake with Presto? A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. … Continue reading Query Data Lake With Presto | Presto Google Cloud | Ahana
Why am I getting a Presto EMR S3 timeout error? If you’re using AWS EMR Presto, you can use the S3 select pushdown feature to push down compute operations (i.e. … Continue reading Presto EMR S3 Timeout Error | Presto Query Timeout | Ahana
No, Presto queries your data in-place so you don’t need to move it. If you’re using AWS S3 for your data lake, for example, you wouldn’t need to ingest it … Continue reading Do I need to move my data to query it with Presto?
How do I sync my partition and metastore in Presto? Sync partition metadata is used to sync the metastore with information on the file system/s3 for the external table. Depending … Continue reading Presto Sync Partition Metastore & Metadata | Presto Sync | Ahana
How do I run a CTAS (Create Table As) with a Query? There are a few different ways to run a CTAS with a Query in Presto. Below we’ll lay … Continue reading How To Run A CTAS With A Query | Create Table As Query
What is the difference between a managed table and external tables? The main difference between a managed and external table is that when you drop an external table, the underlying … Continue reading Difference Between Managed Table & External Tables | Ahana
What is Presto and what are its frequently asked questions (FAQ)?
What Is Trino & FAQs Trino can query data where it is stored, without needing to move data into separate warehouse or analytics database. Queries are executed in parallel with … Continue reading What Is Trino & Trino Data | Trino SQL FAQs & Support | Ahana
Price-Performance Ratio of AWS Athena Presto vs Ahana Cloud for Presto Both AWS Athena and Ahana Cloud are based on the popular open-source Presto project which was originally developed by … Continue reading Price-Performance Ratio of AWS Athena Presto vs Ahana Cloud for Presto
What are the AWS Glue partition limits and does it apply to AWS Athena? Typically you’ll use AWS Glue to create the data sources and tables that Athena will query. … Continue reading AWS Glue Partition Limits For AWS Athena | Ahana
What level of concurrency performance can I expect using Presto as part of the AWS Athena service? I’m getting a lot of my workloads queued up when I use AWS … Continue reading Concurrency Performance Using Presto With AWS Athena Service | Ahana
How do I get deterministic performance out of Amazon Athena? What is Athena? Amazon Athena is an interactive query service based on Presto that makes it easy to analyze data … Continue reading Getting Deterministic Performance Out Of Amazon Athena Guide | Ahana
Do I have to use AWS Lambda to connect to data sources with Athena? The Athena Federated Query Journey AWS announced the public preview of Athena federated query in November … Continue reading Using AWS Lambda To Connect To Data Sources With Athena | Ahana
How do I do geospatial queries and spatial joins in Presto? A question that often comes up is “how do I do geospatial queries and spatial joins in Presto?”. Fortunately … Continue reading Geospatial Queries & Spatial Joins In Presto Guide | Ahana
How do I query JSON documents with Presto? JSON documents are a common data type. A lot of people collect logs and load them into S3. Querying JSON with Presto … Continue reading How Do I Query JSON Documents With Presto | Query JSON Docs | Ahana
Is there latency overhead for Presto queries if everything fits into memory and doesn’t need to be distributed? Presto is both in-memory and distributed, so each work has memory and … Continue reading Latency Overhead For Presto Queries If Fits Into Memory | Ahana
Is the Hive metastore a hard dependency of Presto, or could Presto be configured to use something else like Postgres? With Presto, there’s no hard dependency of having to use … Continue reading Is Hive Metastore Hard Dependency Of Presto | Postgres With Presto | Ahana
The Differences Between Apache Drill vs Presto Drill is an open source SQL query engine which began life as a paper “Dremel: Interactive Analysis of Web-Scale Datasets” from Google in … Continue reading What are the differences between Presto and Apache Drill?
Why am I getting zero records when I use AWS Athena to query a CSV file? There’s a common error many AWS Athena users see when they query CSV files … Continue reading Zero Records Returned CSV | Zero Records AWS Athena | Ahana
Does Presto work natively with GraphQL? Some users may have a primary data store that is GraphQL-based (AWS AppSync) and want to leverage Presto. For context, GraphQL falls in the … Continue reading Presto Graphql | Does Presto Work Natively With GraphQL | Ahana
Why does a single AWS Athena query get stuck in QUEUED state before being executed? One of the drawbacks of AWS Athena is the fact that as a user, you … Continue reading Athena Query Waiting In Queue or Athena Query Stuck In Queued State
How Presto Joins Data Because Presto is a distributed system composed of a coordinator and workers, each worker can connect to one or more data sources through corresponding connectors. The … Continue reading How Presto Joins Data | Presto Data Connectors & Join Example | Ahana
Executing Presto Spark Executing Presto Spark queries is possible, but why leverage Spark as an execution framework for Presto’s queries when Presto is itself an efficient execution engine? The fact … Continue reading Executing Presto Spark | Using Spark’s Execution Engine With Presto | Ahana
When I run a query with AWS Athena, I get the error message ‘query exhausted resources on this scale factor’. Why? AWS Athena is well documented in having performance issues, … Continue reading Query Exhausted Resources On This Scale Factor Error | Ahana