Tutorial: How to define SQL functions with Presto across all connectors

share img define sql functions 1200x625 2

Presto is a widely used SQL query engine for data lakes that comes equipped with many built-in functions to serve most use cases. However, there may be certain situations where you need to define your own function. Luckily, Presto allows you to define dynamic expressions as SQL functions, which are stored separately from the Presto source code and managed by a functions namespace manager. You can even set up this manager with a MySQL database. This feature is so popular at Facebook that they have over a thousand functions defined in their instance of Presto.

Function Namespace Manager

By organizing data in catalogs and schemas, Presto allows users to easily access and manipulate data from multiple sources as if they were a single database.

A catalog is a logical namespace that represents a collection of data sources that can be queried in Presto. Each catalog contains one or more schemas, which are essentially named containers that hold tables, views, and other objects.

A function namespace is a special catalog.schema that stores functions in the format like mysql.test. It is possible to make each catalog.schema a function namespace. A function namespace manager is a type of plugin that handles a collection of these function catalog schemas. Catalogs can be assigned to connectors in Presto which allows the Presto engine to carry out tasks like creating, modifying, and deleting functions.

This user defined function management is separated from connector API for flexibility, hence these SQL functions can be used across all connectors. Further, the query is guaranteed to use the same version of the function throughout the execution and any modification to the functions is versioned. 


Today, function namespace manager is implemented with the help of MySQL, so users need to have a running MySQL service to initialize the MySQL based function namespace manager. 

Step 1: Provision MySQL server and generate jdbc url for further access.

Suppose the MySQL server can be reached at localhost:3306, example database url – 


Step 2: Create database & tables in MySQL database to store function namespace manager related data

 USE presto;

Step 3: Configure at Presto [2]

Create Function namespace manager configuration under etc/function-namespace/mysql.properties:

function-namespace-manager.name=mysql database-url=jdbc:mysql://localhost:3306/presto?user=root&password=password

And restart the Presto Service.

Step 4: Create new function namescape

After starting the Presto server, the following tables will appear under the Presto database, which is used to manage function namespace, in MySQL –

mysql> show tables;
| Tables_in_presto    |
| enum_types          |
| function_namespaces |
| sql_functions       |
93 rows in set (0.00 sec)

To create a new function namespace ”ahana.default”, insert into the function_namespaces table:

INSERT INTO function_namespaces (catalog_name, schema_name)
    VALUES('ahana', 'default');

Step 5: Create a function and query from Presto [1]

SQL functions_blog

Here is simple example of SQL function for COSECANT: 

presto>CREATE OR REPLACE FUNCTION ahana.default.cosec(x double)
RETURNS double
COMMENT ‘Cosecant trigonometric function'
RETURN 1 / sin(x);

More examples can be found at https://prestodb.io/docs/current/sql/create-function.html#examples [1]

Step 6: Apply the newly created function and SQL query

SQL functions_blog

It is required for users to use fully qualified function name while using in SQL query.

Following the the example of using cosec SQL function in the query. 

presto> select ahana.default.cosec (50) as Cosec_value;
(1 row)

Query 20211103_211533_00002_ajuyv, FINISHED, 1 node
Splits: 33 total, 33 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s

Here is another simple example of creating an EpochTimeToLocalDate function to convert Unix time to local timezone under ahana.default function namespace.

presto> CREATE FUNCTION ahana.default. EpochTimeToLocalDate (x bigint) 
     -> RETURNS timestamp 
     -> RETURN from_unixtime (x);

ahana.default.EpochTimeToLocalDate(1629837828) as date;
 2021-08-24 13:43:48.000 
(1 row)


 function-namespaces-table-name  <The name of the table that stores all the function namespaces managed by this manager> property can be used if there is a use case  to instantiate multiple function namespace managers,  otherwise if we can create functions in only one function namespace manager then it can be utilized across all different databases or connectors. [2]

At Ahana we have simplified all these steps that is MySQL container, Schema, databases, tables and additional configurations required to manage functions and data platforms users just need to create their own SQL functions and use them in SQL queries, that’s it, no need to be wary about provisioning and managing additional MySQL servers. 

Future Roadmap

Remote function Support with remote UDF thrift API 

Allows you to run arbitrary functions that are either not safe or not possible to run within worker JVM: unreliable Java functions, C++, Python, etc.


[1] DDL Syntax to use FUNCTIONS

[2] Function Namespace Manager Documentation