Tutorial: How to define SQL functions with Presto across all connectors

Presto is the open source SQL query engine for data lakes. It supports many native functions which are usually sufficient for most use cases. However, there is may be a corner case where you need to implement your own function. To simplify this, Presto allows users to define expressions as SQL functions. These are dynamic functions separated from the Presto source code, managed by a functions namespace manager that you can set up with a MySQL database. In fact, this is one of the most widely used features of Presto at Facebook, with over 1000s of functions defined.

Function Namespace Manager

A function namespace is a special catalog.schema that stores functions in the format like mysql.test. Each catalog.schema can be a function namespace. A function namespace manager is a plugin that manages a set of these function catalog schemas. Catalog can be mapped to connectors in Presto (a connector for functions, no tables or view) and allows the Presto engine to perform actions such as creating, altering, and deleting functions.

This user defined function management is separated from connector API for flexibility, hence these SQL functions can be used across all connectors. Further, the query is guaranteed to use the same version of the function throughout the execution and any modification to the functions is versioned. 

Implementation

Today, function namespace manager is implemented with the help of MySQL, so users need to have a running MySQL service to initialize the MySQL based function namespace manager. 

Step 1: Provision MySQL server and generate jdbc url for further access.

Suppose the MySQL server can be reached at localhost:3306, example database url – 

jdbc:mysql://localhost:3306/presto?user=root&password=password

Step 2: Create database & tables in MySQL database to store function namespace manager related data

 CREATE DATABASE presto;
 USE presto;

Step 3: Configure at Presto [2]

Create Function namespace manager configuration under etc/function-namespace/mysql.properties:

function-namespace-manager.name=mysql database-url=jdbc:mysql://localhost:3306/presto?user=root&password=password
function-namespaces-table-name=function_namespaces
functions-table-name=sql_functions

And restart the Presto Service.

Step 4: Create new function namescape

Now once the Presto server is started we will see below tables under presto database (which is being used to manage function namespace) in Mysql –

mysql> show tables;
+---------------------+
| Tables_in_presto    |
+---------------------+
| enum_types          |
| function_namespaces |
| sql_functions       |
+---------------------+
93 rows in set (0.00 sec)

To create a new function namespace ”ahana.default”, insert into the function_namespaces table:

INSERT INTO function_namespaces (catalog_name, schema_name)
    VALUES('ahana', 'default');

Step 5: Create a function and query from Presto [1]

SQL functions_blog

Here is simple example of SQL function for COSECANT: 

presto>CREATE OR REPLACE FUNCTION ahana.default.cosec(x double)
RETURNS double
COMMENT ‘Cosecant trigonometric function'
LANGUAGE SQL
DETERMINISTIC
RETURNS NULL ON NULL INPUT
RETURN 1 / sin(x);

More examples can be found at https://prestodb.io/docs/current/sql/create-function.html#examples [1]

Step 6: Apply the newly created function and SQL query

SQL functions_blog

It is required for users to use fully qualified function name while using in SQL query.

Following the the example of using cosec SQL function in the query. 

presto> select ahana.default.cosec (50) as Cosec_value;
     Cosec_value     
---------------------
 -3.8113408578721053 
(1 row)

Query 20211103_211533_00002_ajuyv, FINISHED, 1 node
Splits: 33 total, 33 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s

Here is another simple example of creating an EpochTimeToLocalDate function to convert Unix time to local timezone under ahana.default function namespace.

presto> CREATE FUNCTION ahana.default. EpochTimeToLocalDate (x bigint) 
     -> RETURNS timestamp 
     -> LANGUAGE SQL 
     -> DETERMINISTIC RETURNS NULL ON NULL INPUT 
     -> RETURN from_unixtime (x);
CREATE FUNCTION

ahana.default.EpochTimeToLocalDate(1629837828) as date;
          date           
-------------------------
 2021-08-24 13:43:48.000 
(1 row)

Note

 function-namespaces-table-name  <The name of the table that stores all the function namespaces managed by this manager> property can be used if there is a use case  to instantiate multiple function namespace managers,  otherwise if we can create functions in only one function namespace manager then it can be utilized across all different databases or connectors. [2]

At Ahana we have simplified all these steps that is MySQL container, Schema, databases, tables and additional configurations required to manage functions and data platforms users just need to create their own SQL functions and use them in SQL queries, that’s it, no need to be wary about provisioning and managing additional MySQL servers. 

Future Roadmap

Remote function Support with remote UDF thrift API 

Allows you to run arbitrary functions that are either not safe or not possible to run within worker JVM: unreliable Java functions, C++, Python, etc.

References

[1] DDL Syntax to use FUNCTIONS

[2] Function Namespace Manager Documentation