PrestoDB Questions & Answers

Answers to your questions about PrestoDB

Topics

What Is AWS Redshift Used For | Redshift Use Cases

May 13, 20226 min read

What is Amazon Redshift Used For? Introduction Amazon Redshift is one of the most widely-used services in the AWS ecosystem and is a familiar component in many cloud architectures. In … Continue reading What Is AWS Redshift Used For | Redshift Use Cases

Differences Between ETL and ELT in Data Warehousing | Ahana

May 12, 20224 min read

ETL vs ELT in Data Warehousing Introduction ETL, or Extract Transform Load, is when an ETL tool or series of homegrown programs extracts data from a data source(s), often a … Continue reading Differences Between ETL and ELT in Data Warehousing | Ahana

How Much Does Amazon Athena Cost? | Ahana

Apr 25, 20223 min read

Understanding AWS Athena Costs with Examples What Is Amazon Athena?  Since you’re reading this to understand Athena costs, you likely already know, so we’ll just very briefly touch on what … Continue reading How Much Does Amazon Athena Cost? | Ahana

5 Components of Data Warehouse Architecture | Ahana

Apr 25, 20225 min read

5 Components of Data Warehouse Architecture | Ahana The Data Warehouse has been around for decades. Born in the 1980s, it addressed the need for optimized analytics on data. As … Continue reading 5 Components of Data Warehouse Architecture | Ahana

AWS Lake Formation for Enterprise Data Lakes | Ahana Cloud

Apr 6, 20223 min read

AWS Lake Formation is a service that makes it easy to set up a secure data lake very quickly (in a matter of days), providing a governance layer for data lakes on AWS S3. 

How to Use AWS Athena to Query JSON Data | Ahana

Mar 30, 20224 min read

A popular use case is to use Athena to query Parquet, ORC, CSV and JSON files that are typically used for querying directly, or transformed and loaded into a data warehouse.

The Differences Between AWS Athena and AWS Glue | Ahana

Mar 16, 20224 min read

Here, we are going to talk about AWS Athena vs Glue, which is an interesting pairing as they are both complementary and competitive. So, what are they exactly?

How to Query Parquet Files using Amazon Athena | Ahana

Mar 9, 20224 min read

Querying Parquet Files using AWS Amazon Athena Parquet is one of the latest file formats with many advantages over some of the more commonly used formats like CSV and JSON. … Continue reading How to Query Parquet Files using Amazon Athena | Ahana

AWS Lake Formation Blueprints | Amazon Blueprint Types

Mar 7, 20222 min read

This article is focused on the first step and how AWS Lake Formation Blueprints can make that easy and automated. Before you can run analytics to get insights, you need your data continuously pooling into your lake!

The Differences Between AWS RedShift Spectrum vs Athena

Mar 4, 20225 min read

While the thrust of this article is an AWS Redshift Spectrum vs Athena comparison, there can be some confusion with the difference between AWS Redshift Spectrum and AWS Redshift. Very briefly, Redshift is the storage layer/data warehouse, and Redshift Spectrum is an extension to Redshift that is a query engine.

Difference Between AWS Lake Formation vs AWS Glue

Feb 22, 20223 min read

AWS Lake Formation vs AWS Glue – What are the differences? As you start building your analytics stack in AWS, there are several AWS technologies to understand as you begin. … Continue reading Difference Between AWS Lake Formation vs AWS Glue

Limitations of Amazon S3 Select | AWS Select Capabilities

Feb 2, 20226 min read

Amazon S3 Select Limitations What is Amazon S3 Select? Amazon S3 Select allows you to use simple structured query language (SQL) statements to filter the contents of an Amazon S3 … Continue reading Limitations of Amazon S3 Select | AWS Select Capabilities

How To Query Data in Amazon S3 Using Athena | Ahana

Feb 1, 20223 min read

Querying Amazon S3 Data Using AWS Athena The data lake is becoming increasingly popular for more than just data storage. Now we see much more flexibility with what you can … Continue reading How To Query Data in Amazon S3 Using Athena | Ahana

What is AWS Lake Formation? | Amazon S3 Lake formation

Jan 31, 20222 min read

What is AWS Lake Formation? For AWS users who want to get governance on their data lake, AWS Lake Formation is a service that makes it easy to set up … Continue reading What is AWS Lake Formation? | Amazon S3 Lake formation

How Presto Works with LDAP | Presto LDAP Authentication

Jan 28, 20222 min read

How does Presto Work With LDAP? What is LDAP? The Lightweight Directory Access Protocol (LDAP) is an open, vendor-neutral, industry standard application protocol used for directory services authentication. In LDAP … Continue reading How Presto Works with LDAP | Presto LDAP Authentication

What is Apache Ranger | Apache Ranger in Hadoop | Ahana

Jan 28, 20222 min read

What is Apache Ranger? Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the data platform. It is an open-source authorization solution that provides access … Continue reading What is Apache Ranger | Apache Ranger in Hadoop | Ahana

What is a Data Lakehouse Architecture?

Nov 16, 20214 min read

The term Data Lakehouse has become very popular over the last year or so, especially as more customers are migrating their workloads to the cloud. This article will help to … Continue reading What is a Data Lakehouse Architecture?

How to use mathematical functions and operators and aggregate functions for Presto?

Sep 21, 20216 min read

Presto offers several classes of mathematical functions that operate on single values and mathematical operators that allow for operations on values across columns. In addition, aggregate functions can operator on … Continue reading How to use mathematical functions and operators and aggregate functions for Presto?

What is a Presto lag example?

Aug 31, 20212 min read

The Presto lag function a window function that returns the value of an offset before the current row in a window. One common use case for the lag function is … Continue reading What is a Presto lag example?

How do I get the date_diff from previous rows?

Aug 24, 20211 min read

To find the difference in time between consecutive dates in a result set, Presto offers window functions. Take the example table below which contains sample data of users who watched … Continue reading How do I get the date_diff from previous rows?

How do I use the approx_percentile function in Presto?

Aug 9, 20216 min read

The Presto approx_percentile is one of the approximate aggregate functions, and it returns an approximate percentile for a set of values (e.g. column). In this short article, we will explain … Continue reading How do I use the approx_percentile function in Presto?

Can I write back or update data in my Hadoop / Apache Hive cluster through Presto?

Jul 13, 20212 min read

Using Presto with a Hadoop cluster for SQL analytics is pretty common especially in on premise deployments.  With Presto, you can read and query data from the Hadoop datanodes but … Continue reading Can I write back or update data in my Hadoop / Apache Hive cluster through Presto?

How do I convert Unix Epoch time to a date or something more human readable with SQL?

Jul 13, 20211 min read

Many times the Unix Epoch Time gets stored in the database. But this is not very human readable and conversion is required for reports and dashboards.  Example of Unix Epoch … Continue reading How do I convert Unix Epoch time to a date or something more human readable with SQL?

How do I transfer data from a Hadoop / Hive cluster to a Presto cluster?

Jul 13, 20212 min read

Hadoop is a system that manages both compute and data together. Hadoop cluster nodes have the HDFS file system and may also have different types of engines like Apache Hive, … Continue reading How do I transfer data from a Hadoop / Hive cluster to a Presto cluster?

Presto substring operations: How do I get the X characters from a string of a known length?

Jul 7, 20212 min read

Presto provides an overloaded substring function to extract characters from a string. We will use the string “Presto String Operations” to demonstrate the use of this function. Extract last 7 … Continue reading Presto substring operations: How do I get the X characters from a string of a known length?

Spark SQL | What is Spark SQL & Spark SQL Guide | Ahana

Jun 30, 20212 min read

What is Spark SQL? Spark is a general purpose computation engine for large-scale data processing. At Spark’s inception, the primary abstraction was a resilient distributed dataset (RDD), an immutable distributed … Continue reading Spark SQL | What is Spark SQL & Spark SQL Guide | Ahana

Query Data Lake With Presto | Presto Google Cloud | Ahana

Jun 24, 20213 min read

How do I query a data lake with Presto? A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale. … Continue reading Query Data Lake With Presto | Presto Google Cloud | Ahana

Presto EMR S3 Timeout Error | Presto Query Timeout | Ahana

Jun 24, 20212 min read

Why am I getting a Presto EMR S3 timeout error? If you’re using AWS EMR Presto, you can use the S3 select pushdown feature to push down compute operations (i.e. … Continue reading Presto EMR S3 Timeout Error | Presto Query Timeout | Ahana

Do I need to move my data to query it with Presto?

Jun 3, 20211 min read

No, Presto queries your data in-place so you don’t need to move it. If you’re using AWS S3 for your data lake, for example, you wouldn’t need to ingest it … Continue reading Do I need to move my data to query it with Presto?

Presto Sync Partition Metastore & Metadata | Presto Sync | Ahana

May 4, 20212 min read

How do I sync my partition and metastore in Presto? Sync partition metadata is used to sync the metastore with information on the file system/s3 for the external table. Depending … Continue reading Presto Sync Partition Metastore & Metadata | Presto Sync | Ahana

How To Run A CTAS With A Query | Create Table As Query

May 4, 20212 min read

How do I run a CTAS (Create Table As) with a Query? There are a few different ways to run a CTAS with a Query in Presto. Below we’ll lay … Continue reading How To Run A CTAS With A Query | Create Table As Query

Difference Between Managed Table & External Tables | Ahana

May 4, 20212 min read

What is the difference between a managed table and external tables? The main difference between a managed and external table is that when you drop an external table, the underlying … Continue reading Difference Between Managed Table & External Tables | Ahana

What Is Presto & Presto FAQ | Presto Help & Support | Ahana

Apr 27, 20219 min read

What is Presto and what are its frequently asked questions (FAQ)?

What Is Trino & Trino Data | Trino SQL FAQs & Support | Ahana

Apr 22, 20218 min read

What Is Trino & FAQs Trino can query data where it is stored, without needing to move data into separate warehouse or analytics database. Queries are executed in parallel with … Continue reading What Is Trino & Trino Data | Trino SQL FAQs & Support | Ahana

Price-Performance Ratio of AWS Athena Presto vs Ahana Cloud for Presto

Apr 13, 20214 min read

Price-Performance Ratio of AWS Athena Presto vs Ahana Cloud for Presto Both AWS Athena and Ahana Cloud are based on the popular open-source Presto project which was originally developed by … Continue reading Price-Performance Ratio of AWS Athena Presto vs Ahana Cloud for Presto

AWS Glue Partition Limits For AWS Athena | Ahana

Apr 1, 20212 min read

What are the AWS Glue partition limits and does it apply to AWS Athena? Typically you’ll use AWS Glue to create the data sources and tables that Athena will query. … Continue reading AWS Glue Partition Limits For AWS Athena | Ahana

Concurrency Performance Using Presto With AWS Athena Service | Ahana

Mar 9, 20213 min read

What level of concurrency performance can I expect using Presto as part of the AWS Athena service? I’m getting a lot of my workloads queued up when I use AWS … Continue reading Concurrency Performance Using Presto With AWS Athena Service | Ahana

Getting Deterministic Performance Out Of Amazon Athena Guide | Ahana

Mar 5, 20215 min read

How do I get deterministic performance out of Amazon Athena? What is Athena? Amazon Athena is an interactive query service based on Presto that makes it easy to analyze data … Continue reading Getting Deterministic Performance Out Of Amazon Athena Guide | Ahana

Using AWS Lambda To Connect To Data Sources With Athena | Ahana

Mar 5, 20213 min read

Do I have to use AWS Lambda to connect to data sources with Athena? The Athena Federated Query Journey AWS announced the public preview of Athena federated query in November … Continue reading Using AWS Lambda To Connect To Data Sources With Athena | Ahana

Geospatial Queries & Spatial Joins In Presto Guide | Ahana

Mar 4, 20214 min read

How do I do geospatial queries and spatial joins in Presto? A question that often comes up is “how do I do geospatial queries and spatial joins in Presto?”. Fortunately … Continue reading Geospatial Queries & Spatial Joins In Presto Guide | Ahana

How Do I Query JSON Documents With Presto | Query JSON Docs | Ahana

Mar 3, 20213 min read

How do I query JSON documents with Presto? JSON documents are a common data type. A lot of people collect logs and load them into S3. Querying JSON with Presto … Continue reading How Do I Query JSON Documents With Presto | Query JSON Docs | Ahana

Latency Overhead For Presto Queries If Fits Into Memory | Ahana

Mar 1, 20212 min read

Is there latency overhead for Presto queries if everything fits into memory and doesn’t need to be distributed? Presto is both in-memory and distributed, so each work has memory and … Continue reading Latency Overhead For Presto Queries If Fits Into Memory | Ahana

Is Hive Metastore Hard Dependency Of Presto | Postgres With Presto | Ahana

Mar 1, 20211 min read

Is the Hive metastore a hard dependency of Presto, or could Presto be configured to use something else like Postgres? With Presto, there’s no hard dependency of having to use … Continue reading Is Hive Metastore Hard Dependency Of Presto | Postgres With Presto | Ahana

What are the differences between Presto and Apache Drill?

Mar 1, 20213 min read

The Differences Between Apache Drill vs Presto Drill is an open source SQL query engine which began life as a paper “Dremel: Interactive Analysis of Web-Scale Datasets” from Google in … Continue reading What are the differences between Presto and Apache Drill?

Zero Records Returned CSV | Zero Records AWS Athena | Ahana

Feb 3, 20212 min read

Why am I getting zero records when I use AWS Athena to query a CSV file? There’s a common error many AWS Athena users see when they query CSV files … Continue reading Zero Records Returned CSV | Zero Records AWS Athena | Ahana

Presto Graphql | Does Presto Work Natively With GraphQL | Ahana

Feb 3, 20211 min read

Does Presto work natively with GraphQL? Some users may have a primary data store that is GraphQL-based (AWS AppSync) and want to leverage Presto. For context, GraphQL falls in the … Continue reading Presto Graphql | Does Presto Work Natively With GraphQL | Ahana

Athena Query Waiting In Queue or Athena Query Stuck In Queued State

Jan 26, 20212 min read

Why does a single AWS Athena query get stuck in QUEUED state before being executed? One of the drawbacks of AWS Athena is the fact that as a user, you … Continue reading Athena Query Waiting In Queue or Athena Query Stuck In Queued State

How Presto Joins Data | Presto Data Connectors & Join Example | Ahana

Jan 14, 20214 min read

How Presto Joins Data Because Presto is a distributed system composed of a coordinator and workers, each worker can connect to one or more data sources through corresponding connectors. The … Continue reading How Presto Joins Data | Presto Data Connectors & Join Example | Ahana

Executing Presto Spark | Using Spark’s Execution Engine With Presto | Ahana

Jan 13, 20214 min read

Executing Presto Spark Executing Presto Spark queries is possible, but why leverage Spark as an execution framework for Presto’s queries when Presto is itself an efficient execution engine?  The fact … Continue reading Executing Presto Spark | Using Spark’s Execution Engine With Presto | Ahana

Query Exhausted Resources On This Scale Factor Error | Ahana

Jan 12, 20213 min read

When I run a query with AWS Athena, I get the error message ‘query exhausted resources on this scale factor’. Why? AWS Athena is well documented in having performance issues, … Continue reading Query Exhausted Resources On This Scale Factor Error | Ahana