Presto equivalent of mysql group_concat
PrestoDB supports ANSI SQL and includes support for several SQL dialects, including MySQL dialect, making it easy to group and aggregate data in a variety of ways. However not ALL functions in MySQL are supported by PrestoDB!
Let’s look at the really useful MySQL and MariaDB SQL function GROUP_CONCAT() – this is used to concatenate data in column(s) from multiple rows into one field. It is an aggregate (GROUP BY) function which returns a String, assuming the group contains at least one non-NULL value (otherwise it returns NULL). GROUP_CONCAT() is an example of a function that is not yet supported by PrestoDB and this is the error you’ll see if you try using it to get a list of customers that have ordered something along with their order priorities:
presto> use tpch.sf1;
presto:sf1> select custkey, GROUP_CONCAT(DISTINCT orderpriority ORDER BY orderpriority SEPARATOR ',') as OrderPriorities from orders GROUP BY custkey;
Query 20200925_105949_00013_68x9u failed: line 1:16: Function group_concat not registered
So what’s the workaround? array_join() and array_agg() to the rescue!
presto:sf1> select custkey,array_join(array_distinct(array_agg(orderpriority)),',') as OrderPriorities from orders group by custkey;
custkey | OrderPriorities
---------+------------------------------------------------
69577 | 2-HIGH,1-URGENT,3-MEDIUM,5-LOW,4-NOT SPECIFIED
52156 | 4-NOT SPECIFIED,3-MEDIUM,1-URGENT,5-LOW,2-HIGH
108388 | 5-LOW,4-NOT SPECIFIED,2-HIGH,3-MEDIUM,1-URGENT
111874 | 5-LOW,1-URGENT,2-HIGH,4-NOT SPECIFIED
108616 | 1-URGENT,5-LOW,4-NOT SPECIFIED,3-MEDIUM,2-HIGH
(only the first 5 rows displayed)
If you do not want to use the DISTINCT operator (you want duplicates in your result set in other words) then simply drop the array_distinct() function from your query:
presto:sf1> select custkey,array_join(array_agg(orderpriority),',') as OrderPriorities from orders group by custkey;
custkey | OrderPriorities
---------+--------------------------------------------------------------------------------
24499 | 5-LOW,1-URGENT,4-NOT SPECIFIED,3-MEDIUM,2-HIGH,4-NOT SPECIFIED,3-MEDIUM,1-URGENT,2-HIGH,3-MEDIUM,1-URGENT,5-LOW,3-MEDIUM,4-NOT SPECIFIED,4-NOT SPECIFIED,4-NOT SPECIFIED,3-MEDIUM,3-MEDIUM,5-LOW,1-URGENT,1-URGENT,4-NOT SPECIFIE
58279 | 4-NOT SPECIFIED,2-HIGH,5-LOW,1-URGENT,1-URGENT,5-LOW,5-LOW,4-NOT SPECIFIED,1-URGENT,4-NOT SPECIFIED,5-LOW,3-MEDIUM,1-URGENT,4-NOT SPECIFIED,4-NOT SPECIFIED,1-URGENT,5-LOW,5-LOW,3-MEDIUM,3-MEDIUM,1-URGENT,3-MEDIUM,2-HIGH,5-LOW
142027 | 1-URGENT,2-HIGH,2-HIGH,1-URGENT,3-MEDIUM,1-URGENT,5-LOW,4-NOT SPECIFIED,4-NOT SPECIFIED,2-HIGH,3-MEDIUM,2-HIGH,1-URGENT,3-MEDIUM,5-LOW,3-MEDIUM,4-NOT SPECIFIED,2-HIGH,1-URGENT,5-LOW,2-HIGH,5-LOW,1-URGENT,4-NOT SPECIFIED,2-HIG
94169 | 1-URGENT,4-NOT SPECIFIED,4-NOT SPECIFIED,1-URGENT,4-NOT SPECIFIED,3-MEDIUM,4-NOT SPECIFIED,3-MEDIUM,4-NOT SPECIFIED,5-LOW,4-NOT SPECIFIED,2-HIGH,5-LOW,4-NOT SPECIFIED
31607 | 4-NOT SPECIFIED,2-HIGH,4-NOT SPECIFIED,2-HIGH,2-HIGH,5-LOW
You can of course specify the separator character – in the above example I’ve used a comma.
It’s worth noting that, like PrestoDB, there wasn’t a T-SQL equivalent of the MySQL GROUP_CONCAT() function in Microsoft SQL Server either. However, T-SQL now has the STRING_AGG() function which is available from SQL Server 2017 onwards.
And hey, presto, you now have a working Presto equivalent of mysql group_concat.