How do I query JSON documents with Presto?

JSON documents are a common data type. A lot of people collect logs and load them into S3. Querying JSON with Presto can be challenging because you may not always have a consistent schema. However, the great part about Presto is that it has functionality that enables you to get insights on your JSON data.

Presto has a JSON data type. The JSON data type is similar to a string. One way to think about this is that it is just a string containing JSON, with the exception that the data inside the string has already been parsed, it’s already well-formed JSON.

But it’s also slightly semantically different in that JSON has maps. And when you have a string with a map, the keys of the map can be ordered differently. But in the JSON data type, the keys in the map are always going to be in a consistent order. And it’s worth noting that two maps of the data with different JSON data type values will compare the same, even if the keys are in different order. This can be important when doing an aggregation regrouping or a comparison. Logically two maps will be treated as the same value, no matter what the order is.  

How the JSON parse function works

The JSON parse function takes a JSON string and returns a JSON data type. Here is an example of the JSON data type:

SELECT json_parse('null'); -- JSON 'null'
SELECT json_parse('true'); -- JSON 'true'
SELECT json_parse('42'); -- JSON '42'
SELECT json_parse('"abc"'); -- JSON '"abc"'
SELECT json_parse('[1, 2, 3]'); -- JSON '[1,2,3]'
SELECT json_parse('["a": 1, "b": 2]'); -- JSON '["a": 1, "b": 2]'
SELECT json_parse('[”hey”, 42, {“xyz”: 100, “abc” : false}]’); 

After its parts with json_parse, the result is that the keys in the map are now ordered. This is how you result with the same JSON data type and how they compare the same. When two different maps with differently ordered keys are parsed, they’ll have the same contents.

In Presto, all the JSON functions take the JSON data type. And if you pass a string into the JSON function, it will be implicitly parsed into JSON.

Another question that comes up is what is the difference of NULL between JSON and SQL? J_null and a SQL null?

This can be confusing because JSON has a NULL and so does SQL. These are two different things. A SQL NULL is considered a special value and work very differently like they do in other programming languages. Operations on a NULL gets turned into a NULL. Whereas a JSON NULL is just another value, like a number or a string or an array. So they’re totally different. This has implications when doing Boolean expressions.

We have a lot more resources on Presto if you’d like to learn more, check out our Getting Started with Presto page.