Presto equivalent of mysql group_concat
As you may know, PrestoDB supports ANSI SQL and includes support for several SQL dialects. These dialects include MySQL dialect, making it easy to group and aggregate data in a variety of ways. However not ALL functions in MySQL are supported by PrestoDB!
Now, let us look at the really useful MySQL and MariaDB SQL function GROUP_CONCAT(). This function is used to concatenate data in column(s) from multiple rows into one field. It is an aggregate (GROUP BY) function which returns a String. This is assuming the group contains at least one non-NULL value (otherwise it returns NULL). GROUP_CONCAT() is an example of a function that is not yet supported by PrestoDB and this is the error you will see if you try using it to get a list of customers that have ordered something along with their order priorities:
presto> use tpch.sf1; presto:sf1> select custkey, GROUP_CONCAT(DISTINCT orderpriority ORDER BY orderpriority SEPARATOR ',') as OrderPriorities from orders GROUP BY custkey; Query 20200925_105949_00013_68x9u failed: line 1:16: Function group_concat not registered
Is there a way to handle this? If so what’s the workaround? There is!
array_join() and array_agg() to the rescue!
presto:sf1> select custkey,array_join(array_distinct(array_agg(orderpriority)),',') as OrderPriorities from orders group by custkey; custkey | OrderPriorities ---------+------------------------------------------------ 69577 | 2-HIGH,1-URGENT,3-MEDIUM,5-LOW,4-NOT SPECIFIED 52156 | 4-NOT SPECIFIED,3-MEDIUM,1-URGENT,5-LOW,2-HIGH 108388 | 5-LOW,4-NOT SPECIFIED,2-HIGH,3-MEDIUM,1-URGENT 111874 | 5-LOW,1-URGENT,2-HIGH,4-NOT SPECIFIED 108616 | 1-URGENT,5-LOW,4-NOT SPECIFIED,3-MEDIUM,2-HIGH (only the first 5 rows displayed)
If you do not want to use the DISTINCT operator (you want duplicates in your result set in other words) then there is an easy solution. To skip the DISTINCT operator, simply drop the array_distinct() function from your query:
presto:sf1> select custkey,array_join(array_agg(orderpriority),',') as OrderPriorities from orders group by custkey; custkey | OrderPriorities ---------+-------------------------------------------------------------------------------- 24499 | 5-LOW,1-URGENT,4-NOT SPECIFIED,3-MEDIUM,2-HIGH,4-NOT SPECIFIED,3-MEDIUM,1-URGENT,2-HIGH,3-MEDIUM,1-URGENT,5-LOW,3-MEDIUM,4-NOT SPECIFIED,4-NOT SPECIFIED,4-NOT SPECIFIED,3-MEDIUM,3-MEDIUM,5-LOW,1-URGENT,1-URGENT,4-NOT SPECIFIE 58279 | 4-NOT SPECIFIED,2-HIGH,5-LOW,1-URGENT,1-URGENT,5-LOW,5-LOW,4-NOT SPECIFIED,1-URGENT,4-NOT SPECIFIED,5-LOW,3-MEDIUM,1-URGENT,4-NOT SPECIFIED,4-NOT SPECIFIED,1-URGENT,5-LOW,5-LOW,3-MEDIUM,3-MEDIUM,1-URGENT,3-MEDIUM,2-HIGH,5-LOW 142027 | 1-URGENT,2-HIGH,2-HIGH,1-URGENT,3-MEDIUM,1-URGENT,5-LOW,4-NOT SPECIFIED,4-NOT SPECIFIED,2-HIGH,3-MEDIUM,2-HIGH,1-URGENT,3-MEDIUM,5-LOW,3-MEDIUM,4-NOT SPECIFIED,2-HIGH,1-URGENT,5-LOW,2-HIGH,5-LOW,1-URGENT,4-NOT SPECIFIED,2-HIG 94169 | 1-URGENT,4-NOT SPECIFIED,4-NOT SPECIFIED,1-URGENT,4-NOT SPECIFIED,3-MEDIUM,4-NOT SPECIFIED,3-MEDIUM,4-NOT SPECIFIED,5-LOW,4-NOT SPECIFIED,2-HIGH,5-LOW,4-NOT SPECIFIED 31607 | 4-NOT SPECIFIED,2-HIGH,4-NOT SPECIFIED,2-HIGH,2-HIGH,5-LOW
You can, of course specify the separator character. In the example shown above, I have used a comma as the separator.
It’s worth noting that, like PrestoDB, there wasn’t a T-SQL equivalent of the MySQL GROUP_CONCAT() function in Microsoft SQL Server either. However, T-SQL now has the STRING_AGG() function which is available from SQL Server 2017 onwards.
And hey, presto, you now have a working Presto equivalent of mysql group_concat.
Understanding the Presto equivalent of mysql group_concat – Now what?
If you are looking for more tips are tricks, here’s your next step. Check out our Answers section to learn more about PrestoDB, competitor reviews and comparisons, and more technical guides and resources to get you started.
What’s Presto, how did it start, and what is it for? Ready to answer these questions? Take a deeper dive into understanding Presto. Learn what PrestoDB is, how it got started, and the benefits for Presto users.
What is AWS Lake Formation? AWS lake formation helps users to build, manage, and secure their data lakes in a very short amount of time, meaning days instead of months as is common with a traditional data lake approach. Learn more about AWS Lake Formation, including the pros and cons of Amazon Lake Formation.
Looking to learn more about data warehouses? Start here for a deeper look. This article will cover topics like what it is: a data warehouse is a data repository. Also get more info about their use: typically a warehouse is used for analytic systems and Business Intelligence tools. Take a look at this article to get a better understand of what it is, how it’s used, and the pros and cons of a data warehouse compared to a data lake.