Presto offers several classes of mathematical functions that operate on single values and mathematical operators that allow for operations on values across columns. In addition, aggregate functions can operator on a set of values to compute a single result.

The mathematical functions are broken into four subcategories: 1. mathematical, 2. statistical, 3. trigonometric, and 4. floating point. The majority fall into the mathematical category and we’ll discuss them separately. The statistical functions are quite sparse with two functions that compute the lower and upper bound of the Wilson score interval of a Bernoulli process. The trigonometric functions are what you’d expect (e.g. `sin`

, `cos`

, `tan`

, etc.). The floating point functions are really functions that handle not-a-number and infinite use cases.

The mathematical functions subcategory further fall into another layer of classification:

- Functions that perform coarser approximation, such as rounding and truncation:
`abs`

,`ceiling`

(`ceil`

),`floor`

,`round`

,`sign`

,`truncate`

- Conversions:
`degrees`

,`radians`

,`from_base`

,`to_base`

- Exponents, logarithms, roots:
`exp`

,`ln`

,`log2`

,`log10`

,`power`

(`pow`

),`cbrt`

,`sqrt`

- Convenient constants, such as
`pi()`

,`e()`

,`random`

(`rand`

) - Cumulative distribution functions (and inverses):
`binomial_cdf`

,`inverse_binomial_cdf`

,`cauchy_cdf`

,`inverse_cauchy_cdf`

,`chi_squared_cdf`

,`inverse_chi_squared_cdf`

,`normal_cdf`

,`inverse_normal_cdf`

,`poisson_cdf`

,`inverse_poisson_cdf`

,`weibull_cdf`

,`inverse_weibull_cdf`

,`beta_cdf`

,`inverse_beta_cdf`

,`width_bucket`

- Miscellaneous:
`mod`

,`cosine_similarity`

The mathematical operators are basic arithmetic operators, such as addition (`+`

), subtraction (`-`

), multiplication (`*`

), and modulus (`%`

).

Let’s apply these mathematical functions in an example. In the following query, have a floating-point column `x`

to which we apply several mathematical functions that are representative of the subcategories we discussed previously, including: radians (conversion), natural log, the Normal CDF, modulo, random number, and operators.

```
select
x,
radians(x) as radians_x, /* convert to radians */
ln(x) as ln_x, /* natural log */
normal_cdf(0,30,x) as_normal_cdf_x, /* Normal CDF */
mod(x,2) as mod_x_2, /* Modulo 2 */
random() as r, /* Random number */
3*((x/2)+2) as formula /* Formula using operators */
from
example;
```

The following is the output the above query with some rounding for ease of viewing.

So, far we see that mathematical functions, as they are classified in Presto, operate on single values. What this means is that given a column of values, each function is applied element-wise to that column. Aggregate functions allow us to look across a set of values.

Like mathematical functions, aggregate functions are also broken into subcategories: 1. general, 2. bitwise, 3. map, 4. approximate, 5. statistical, 6. classification metrics, and 7. differential entropy. We will discuss the general and approximate subcategory separately.

The bitwise aggregate functions are two functions that return the bitwise `AND`

and bitwise `OR`

or all input values in 2’s complement representation. The map aggregate functions provide convenient map creation functions from input values. The statistical aggregate functions are standard summary statistic functions you would expect, such as `stddev`

, `variance`

, `kurtosis`

, and `skewness`

. The classification metrics and differential entropy aggregate functions are specialized functions that make it easy to analyze binary classification predictive modelling and model binary differential entropy, respectively.

The general functions subcategory further fall into another layer of classification:

- Common summarizations:
`count`

,`count_if`

,`min`

,`max`

,`min_by`

,`max_by`

,`sum`

,`avg`

,`geometric_mean`

,`checksum`

- Boolean tests:
`bool_or`

,`bool_and`

,`every`

- Data structure consolidation:
`array_agg`

,`set_agg`

,`set_union`

- Miscellaneous:
`reduce_agg`

,`arbitrary`

Again, let’s apply these aggregate functions in a series of representative examples. In the following query, we apply a series of basic aggregations to our floating-point column `x`

.

```
select
sum(x) as sum_x,
count(x) as count_x,
min(x) as min_x,
max(x) as max_x,
avg(x) as avg_x,
checksum(x) as ckh_x
from
example;
```

The following is the output the above query.

In the following query, we showcase a boolean test with the `bool_or`

function. We know that the natural log will return a `NaN`

for negative values of `x`

. So, if we apply the `is_nan`

check, we expect `x`

to always be `false`

, but for our `ln`

result to occasionally be `true`

. Finally, if we were to do the `bool_or`

aggregation on our `is_nan`

functions, we expect the column derived from `x`

to be `false`

(i.e. no `true`

at all) and the column derived fro `ln(x)`

to be `true`

(i.e. at least one true value). The following query and accompanying result illustrate this.

```
with nan_test as (
select
is_nan(x) as is_nan_x,
is_nan(ln(x)) as is_nan_ln_x
from
example
)
select
bool_or(is_nan_x) as any_nan_x_true,
bool_or(is_nan_ln_x) as any_nan_ln_x_true
from
nan_test;
```

This final example illustrates the use of an example of data consolidation, taking a `x`

and `radians(x)`

columns and creating a single row with a map data structure.

```
with rad as(select x, radians(x) as rad_x from example)
select map_agg(x, rad_x) from rad;
```

The approximate aggregate functions provide approximate results for aggregate large data sets, such as distinct values (`approx_distinct`

), percentiles (`approx_percentile`

), and histograms (`numeric_histogram`

). In fact, we have a short answer post on how to use the `approx_percentile`

function. Several of the approximate aggregate functions rely on other functions and data structures: quantile digest, HyperLogLog and KHyperLogLog.

A natural extension to aggregate functions are window functions, which perform calculations across rows of a query results. In fact, all aggregate functions can be used as window functions by adding an `OVER`

clause. One popular application of window functions is time-series analysis. In particular, the `lag`

function window function is quite useful. We have a short answer post on how to use the `lag`

window function and to compute differences in dates using the `lag`

window function.

This was short article was a high-level overview, and you are encouraged to review the Presto public documentation for Mathematical Functions and Operations and Aggregate Functions. If you want to get started with Presto easily, check out Ahana Cloud. It’s SaaS for Presto and takes away all the complexities of tuning, management and more. It’s free to try out for 14 days, then it’s pay-as-you-go through the AWS marketplace.