Presto equivalent of mysql group_concat

PrestoDB supports ANSI SQL and includes support for several SQL dialects, including MySQL dialect, making it easy to group and aggregate data in a variety of ways. However not ALL functions in MySQL are supported by PrestoDB! 

Let’s look at the really useful MySQL and MariaDB SQL function GROUP_CONCAT() – this is used to concatenate data in column(s) from multiple rows into one field. It is an aggregate (GROUP BY) function which returns a String, assuming the group contains at least one non-NULL value (otherwise it returns NULL). GROUP_CONCAT() is an example of a function that is not yet supported by PrestoDB and this is the error you’ll see if you try using it to get a list of customers that have ordered something along with their order priorities:

presto> use tpch.sf1;

presto:sf1> select custkey, GROUP_CONCAT(DISTINCT orderpriority ORDER BY orderpriority SEPARATOR ',') as OrderPriorities from orders GROUP BY custkey;

Query 20200925_105949_00013_68x9u failed: line 1:16: Function group_concat not registered

So what’s the workaround?  array_join() and array_agg() to the rescue! 

presto:sf1> select custkey,array_join(array_distinct(array_agg(orderpriority)),',') as OrderPriorities from orders group by custkey;
 custkey |                OrderPriorities                 
---------+------------------------------------------------
   69577 | 2-HIGH,1-URGENT,3-MEDIUM,5-LOW,4-NOT SPECIFIED 
   52156 | 4-NOT SPECIFIED,3-MEDIUM,1-URGENT,5-LOW,2-HIGH 
  108388 | 5-LOW,4-NOT SPECIFIED,2-HIGH,3-MEDIUM,1-URGENT 
  111874 | 5-LOW,1-URGENT,2-HIGH,4-NOT SPECIFIED          
  108616 | 1-URGENT,5-LOW,4-NOT SPECIFIED,3-MEDIUM,2-HIGH 
(only the first 5 rows displayed) 

If you do not want to use the DISTINCT operator (you want duplicates in your result set in other words) then simply drop the array_distinct() function from your query:

presto:sf1> select custkey,array_join(array_agg(orderpriority),',') as OrderPriorities from orders group by custkey;
 custkey | OrderPriorities                             
---------+-------------------------------------------------------------------------------- 
   24499 | 5-LOW,1-URGENT,4-NOT SPECIFIED,3-MEDIUM,2-HIGH,4-NOT SPECIFIED,3-MEDIUM,1-URGENT,2-HIGH,3-MEDIUM,1-URGENT,5-LOW,3-MEDIUM,4-NOT SPECIFIED,4-NOT SPECIFIED,4-NOT SPECIFIED,3-MEDIUM,3-MEDIUM,5-LOW,1-URGENT,1-URGENT,4-NOT SPECIFIE
   58279 | 4-NOT SPECIFIED,2-HIGH,5-LOW,1-URGENT,1-URGENT,5-LOW,5-LOW,4-NOT SPECIFIED,1-URGENT,4-NOT SPECIFIED,5-LOW,3-MEDIUM,1-URGENT,4-NOT SPECIFIED,4-NOT SPECIFIED,1-URGENT,5-LOW,5-LOW,3-MEDIUM,3-MEDIUM,1-URGENT,3-MEDIUM,2-HIGH,5-LOW
  142027 | 1-URGENT,2-HIGH,2-HIGH,1-URGENT,3-MEDIUM,1-URGENT,5-LOW,4-NOT SPECIFIED,4-NOT SPECIFIED,2-HIGH,3-MEDIUM,2-HIGH,1-URGENT,3-MEDIUM,5-LOW,3-MEDIUM,4-NOT SPECIFIED,2-HIGH,1-URGENT,5-LOW,2-HIGH,5-LOW,1-URGENT,4-NOT SPECIFIED,2-HIG
   94169 | 1-URGENT,4-NOT SPECIFIED,4-NOT SPECIFIED,1-URGENT,4-NOT SPECIFIED,3-MEDIUM,4-NOT SPECIFIED,3-MEDIUM,4-NOT SPECIFIED,5-LOW,4-NOT SPECIFIED,2-HIGH,5-LOW,4-NOT SPECIFIED                                                                                                                                                                                                        
   31607 | 4-NOT SPECIFIED,2-HIGH,4-NOT SPECIFIED,2-HIGH,2-HIGH,5-LOW 

You can of course specify the separator character – in the above example I’ve used a comma.

It’s worth noting that, like PrestoDB, there wasn’t a T-SQL equivalent of the MySQL GROUP_CONCAT() function in Microsoft SQL Server either.  However, T-SQL now has the STRING_AGG() function which is available from SQL Server 2017 onwards.

And hey, presto, you now have a working Presto equivalent of mysql group_concat.