How do I do geospatial queries and spatial joins in Presto?
A question that often comes up is “how do I do geospatial queries and spatial joins in Presto?”. Fortunately Presto supports a wealth of functions and geospatial-specific joins to get the job done.
Let’s get started with a step-by-step tutorial. First we’ll set-up some test data in two tables. The first table is trips_table which will store 3 rows each representing simple taxi trips. We store a trip id, the origin and destination long/lat coordinates, and the journey start time:
create table trips_table ( trip_id int, orig_long double, orig_lat double, dest_long double, dest_lat double, trip_start timestamp); insert into trips_table values (1, 51.50953, -0.13467, 51.503041, -0.117648, cast('2021-03-02 09:00:00 UTC' as timestamp)); insert into trips_table values (2, 34.039874, -118.258663, 34.044806, -118.235187, cast('2021-03-02 09:30:00 UTC' as timestamp)); insert into trips_table values (3, 48.858965, 2.293497,48.859952, 2.340328, cast('2021-03-02 09:45:00 UTC' as timestamp)); insert into trips_table values (4, 51.505120, -0.089522, 51.472602, -0.489912, cast('2021-03-02 10:45:00 UTC' as timestamp));
- Trip 1 is a ride within central London, from Piccadilly Circus to the London Eye.
- Trip 2 is a ride in downtown Los Angeles
- Trip 3 is a ride from the Eiffel Tower to Musée du Louvre, Paris
- Trip 4 is a ride from Borough Market in central London to Heathrow Airport Terminal 5 (outside central London).
The second table is
city_table with each row storing the shape representing an area e.g. central London, and a name for the shape. We represent the shape with a sequence of coordinates that enclose a specific area:
create table city_table ( geo_shape varchar, name varchar); insert into city_table values ('POLYGON((51.519287 -0.172316,51.519287 -0.084103,51.496393 -0.084103,51.496393 -0.172316,51.519287 -0.172316))', 'London, central'); insert into city_table values('POLYGON((33.9927 -118.3023,33.9902 -118.1794,34.0911 -118.2436,33.9927 -118.3023))', 'Los Angeles, downtown');
- I used a simple triangle to represent downtown LA (see illustration below)
- I used a rectangle representing central London.
- In each case the first pair coordinates for the shape are the same as the last pair – so it’s an enclosed bounding box or polygon we’re describing.
- We’re storing our shapes as text in a varchar column for simplicity.
- We describe each polygon as comma-separated pairs of long/lat coords using the
POLYGON(())function. The double brackets are required.
Now let’s run a query to count how many trips occurred in each city. We join our two tables, and we use each journey’s originating long/lat coordinates to determine – using
ST_CONTAINS() – if that point exists in any of our shapes. This function requires the polygon to be expressed as a special type – Geometry – so we convert our shape from text using
SELECT c.name as City, count(*) as Trips FROM trips_table as t JOIN city_table as c ON ST_Contains(ST_GeometryFromText(c.geo_shape), st_point(t.orig_long, t.orig_lat)) GROUP BY 1; City | Trips -----------------------+------- Los Angeles, downtown | 1 London, central | 2 (2 rows)
We see both London trips made it into the result set, despite one of the trips ending at the airport which is a way outside the shape we defined for central London – this is because the query uses the originating coordinates for each trip, not the destination coordinates.
Also notice the Paris trip didn’t make it into the result – this is because we did not define a shape for Paris.
In this example you’ve seen some of the benefits of using Ahana Cloud for Presto.
Presto’s Geospatial functions are listed in the Presto documentation.
We have a ton of resources to help you get started with Presto, check them out here.