Extract Keys in a Nested JSON Array Object With Presto

Let’s say we have some JSON that looks like this:

[{"id": 1, "value":"xxx"}, {"id": 2, "value":"yyy"}]

In this instance our json contains key value pairs. How can we decode that using PrestoDB SQL to extract only the keys (1,2 in this example)?  

We can extract keys in a nested JSON array object using SQL arrays, the JSON_PARSE function and the TRANSFORM function.  The TRANSFORM function’s syntax is:

transform(array(T), function(T, U)) -> array(U)

Returns an array that is the result of applying function to each element of array:

For this example query we’ll also use the VALUES function to provide the test data – VALUES is a handy way to define an inline  table – kind of like a dummy table.  

select 
TRANSFORM(CAST(JSON_PARSE(array) AS ARRAY<MAP<VARCHAR, VARCHAR>>), entry -> entry['id']) as My_Keys 
from (values ('[{"id": 1, "value":"xxx"}, {"id":2, "value":"yy"}]')) t(array);

 My_Keys 
--------
 [1, 2]  

This works by first creating an array containing the input JSON data called ‘array’.  We then use JSON_PARSE which converts a string or array containing JSON text into deserialized JSON values, and we cast those values as an array with two varchar elements.   TRANSFORM then returns an array containing only the elements we want  (we call our desired elements ‘entry’ in this example but it could be called anything).

So with Presto’s rich set of array and JSON functions we can see how to easily extract keys in a nested JSON array object. If you want to get up and running with Presto, Ahana Cloud’s cloud managed service built for AWS is the easiest way to do that. See our docs for more details.