How do I do geospatial queries and spatial joins in Presto?

A question that often comes up is “how do I do geospatial queries and spatial joins in Presto?”. Fortunately Presto supports a wealth of functions and geospatial-specific joins to get the job done.

Let’s get started with a step-by-step tutorial. First we’ll set-up some test data in two tables.  The first table is trips_table which will store 3 rows each representing simple taxi trips. We store a trip id, the origin and destination long/lat coordinates, and the journey start time:

create table trips_table (
trip_id int, 
orig_long double, 
orig_lat double, 
dest_long double, 
dest_lat double, 
trip_start timestamp);

insert into trips_table values (1, 51.50953, -0.13467, 51.503041, -0.117648, cast('2021-03-02 09:00:00 UTC' as timestamp));
insert into trips_table values (2, 34.039874, -118.258663, 34.044806, -118.235187, cast('2021-03-02 09:30:00 UTC' as timestamp));
insert into trips_table values (3, 48.858965, 2.293497,48.859952, 2.340328, cast('2021-03-02 09:45:00 UTC' as timestamp));
insert into trips_table values (4, 51.505120, -0.089522, 51.472602, -0.489912, cast('2021-03-02 10:45:00 UTC' as timestamp));

For information:

  • Trip 1 is a ride within central London, from Piccadilly Circus to the London Eye.
  • Trip 2 is a ride in downtown Los Angeles
  • Trip 3 is a ride from the Eiffel Tower to Musée du Louvre, Paris 
  • Trip 4 is a ride from Borough Market in central London to Heathrow Airport Terminal 5 (outside central London).

The second table is city_table with each row storing the shape representing an area e.g. central London, and a name for the shape. We represent the shape with a sequence of coordinates that enclose a specific area: 

create table city_table (
geo_shape varchar, 
name varchar);

insert into city_table values ('POLYGON((51.519287 -0.172316,51.519287 -0.084103,51.496393 -0.084103,51.496393 -0.172316,51.519287 -0.172316))', 'London, central');
insert into city_table values('POLYGON((33.9927 -118.3023,33.9902 -118.1794,34.0911 -118.2436,33.9927 -118.3023))', 'Los Angeles, downtown');

For information:

  • I used a simple triangle to represent downtown LA (see illustration below)
  • I used a rectangle representing central London. 
  • In each case the first pair coordinates for the shape are the same as the last pair – so it’s an enclosed bounding box or polygon we’re describing.
  • We’re storing our shapes as text in a varchar column for simplicity.
  • We describe each polygon as comma-separated pairs of long/lat coords using the POLYGON(()) function.  The double brackets are required. 
Screen Shot 2021 03 04 at 12.26.17 PM
Simple polygon to represent downtown LA 
(credit: “Geospatial Data Analysis using SQL”)

Now let’s run a query to count how many trips occurred in each city. We join our two tables, and we use each journey’s originating long/lat coordinates to determine – using ST_CONTAINS() – if that point exists in any of our shapes.  This function requires the polygon to be expressed as a special type – Geometry – so we convert our shape from text using ST_GeometryFromText() function:

SELECT c.name as City, count(*) as Trips 
FROM trips_table as t 
JOIN city_table as c 
ON ST_Contains(ST_GeometryFromText(c.geo_shape), st_point(t.orig_long, t.orig_lat)) 
GROUP BY 1;

         City          | Trips 
-----------------------+-------
 Los Angeles, downtown |     1 
 London, central       |     2 
(2 rows)

We see both London trips made it into the result set, despite one of the trips ending at the airport which is a way outside the shape we defined for central London – this is because the query uses the originating coordinates for each trip, not the destination coordinates. 

Also notice the Paris trip didn’t make it into the result – this is because we did not define a shape for Paris.

In this example you’ve seen some of the benefits of using Ahana Cloud for Presto

Presto’s Geospatial functions are listed in the Presto documentation.  

We have a ton of resources to help you get started with Presto, check them out here.