Presto Geospatial Functions Tutorial
A question that often comes up is “how do I do geospatial queries and spatial joins in Presto?”. Fortunately Presto supports a wealth of functions and geospatial-specific joins to get the job done.
Let’s get started with a step-by-step tutorial. First we’ll set-up some test data in two tables. The first table is trips_table which will store 3 rows each representing simple taxi trips. We store a trip id, the origin and destination long/lat coordinates, and the journey start time:
create table trips_table (
insert into trips_table values (1, 51.50953, -0.13467, 51.503041, -0.117648, cast(‘2021-03-02 09:00:00 UTC’ as timestamp));
insert into trips_table values (2, 34.039874, -118.258663, 34.044806, -118.235187, cast(‘2021-03-02 09:30:00 UTC’ as timestamp));
insert into trips_table values (3, 48.858965, 2.293497,48.859952, 2.340328, cast(‘2021-03-02 09:45:00 UTC’ as timestamp));
insert into trips_table values (4, 51.505120, -0.089522, 51.472602, -0.489912, cast(‘2021-03-02 10:45:00 UTC’ as timestamp));
- Trip 1 is a ride within central London, from Piccadilly Circus to the London Eye.
- Trip 2 is a ride in downtown Los Angeles
- Trip 3 is a ride from the Eiffel Tower to Musée du Louvre, Paris
- Trip 4 is a ride from Borough Market in central London to Heathrow Airport Terminal 5 (outside central London).
The second table is city_table with each row storing the shape representing an area e.g. central London, and a name for the shape. We represent the shape with a sequence of coordinates that enclose a specific area:
create table city_table (
insert into city_table values (‘POLYGON((51.519287 -0.172316,51.519287 -0.084103,51.496393 -0.084103,51.496393 -0.172316,51.519287 -0.172316))’, ‘London, central’);
insert into city_table values(‘POLYGON((33.9927 -118.3023,33.9902 -118.1794,34.0911 -118.2436,33.9927 -118.3023))’, ‘Los Angeles, downtown’);
- I used a simple triangle to represent downtown LA (see illustration below)
- I used a rectangle representing central London.
- In each case the first pair coordinates for the shape are the same as the last pair – so it’s an enclosed bounding box or polygon we’re describing.
- We’re storing our shapes as text in a varchar column for simplicity.
- We describe each polygon as comma-separated pairs of long/lat coords using the POLYGON(()) function. The double brackets are required.
Now lets run a query to count how many trips occurred in each city. We join our two tables, and we use each journey’s originating long/lat coordinates to determine – using ST_CONTAINS() – if that point exists in any of our shapes. This function requires the polygon to be expressed as a special type – Geometry – so we convert our shape from text using ST_GeometryFromText() function:
SELECT c.name as City, count(*) as Trips FROM trips_table as t JOIN city_table as c ON ST_Contains(ST_GeometryFromText(c.geo_shape), st_point(t.orig_long, t.orig_lat)) GROUP BY 1;
|Los Angeles, downtown||1|
We see both London trips made it into the result set, despite one of the trips ending at the airport which is a way outside the shape we defined for central London – this is because the query uses the originating coordinates for each trip, not the destination coordinates.
Also notice the Paris trip didn’t make it into the result – this is because we did not define a shape for Paris.
In this example you’ve seen some of the benefits of using Ahana Cloud for Presto
Presto’s Geospatial functions are listed in the Presto documentation.
Check out our other Learning Center content at https://ahana.io/learn/ where you’ll find useful tutorials and examples, from using Python with Presto, to Machine Learning and more. We also cover comparisons between Presto and other technologies.
Also check https://ahana.io/answers/ for answers to frequently asked questions on all things Presto.