Where can I find different Presto metrics for monitoring?
If you’re wondering “Where can I find different Presto metrics for monitoring?”, we’ll help explain it. There are several ways to monitor Presto metrics. Let’s look at some options.
1. Presto Metrics: Presto Console
Presto provides a web interface for monitoring and managing queries. The web interface is accessible on the Presto coordinator via HTTP, using the HTTP port number specified in the coordinator Config Properties (default is 8080). The console’s main page looks like this:
This main page has a list of queries along with information like unique query ID, query text, query state, percentage completed, username and source from which this query originated. The currently running queries are at the top of the page, followed by the most recently completed or failed queries. A query can have one of several states:
- QUEUED – Query has been accepted and is awaiting execution.
- PLANNING – Query is being planned.
- STARTING – Query execution is being started.
- RUNNING – Query has at least one running task.
- BLOCKED – Query is blocked and is waiting for resources (buffer space, memory, splits, etc.).
- FINISHING – Query is finishing (e.g. commit for autocommit queries).
- FINISHED – Query has finished executing and all output has been consumed.
- FAILED – Query execution failed.
The following console screenshot shows an example of an executed Presto query example; here’s a breakdown of the stats:
- The query was run by the root user, using the presto-cli tool.
- 54 splits were completed during execution.
- It took 5.17 seconds (wall clock time) to run.
- The query consumed 19.7 secs of CPU time – this is greater than the wall clock time since there are multiple CPUs and multiple cores at work.
- In terms of (JVM) memory, the query used up to 59.2MB during execution.
- The “Cumulative User Memory” (35.5M) is the sum of all (user) memory consumption across all query stages from queryStats. The unit of this metric is M seconds which means the memory data size consumed by the user to execute the query in total, excluding the memory consumption of the system:
Click on the query ID link (20200925 in this example) and you will see a LOT more detail:
Notice the “Live Plan” tab at top-right which will give you a graphical representation of the query plan, which you read from the bottom up:
The plan is clickable – click a stage and you can drill down into more detail to monitor Presto metrics.
Java Management Extensions (JMX) provides information about the Java Virtual Machine and all of the software running inside it.
JMX is actually a connector which has been configured so that chosen JMX information will be periodically dumped and stored in tables (in the “jmx” catalog) which can be queried. Note this can be controlled in the properties file: /etc/catalog/jmx.properties
JMX is useful for debugging and monitoring Presto metrics.
Here’s how to query it using presto-cli:
$ presto --schema jmx --catalog jmx > select * from jmx.information_schema.tables; -- lists tables in the information schema > show tables from jmx.information_schema; -- another way of listing tables > select * from jmx.information_schema.views; -- lists all views
The most useful JMX schema for monitoring is “current” which contains every MBean from every node in the Presto cluster. The following query uses the “current” schema to return information from the different Presto memory pools on each node:
> SELECT freebytes, node, object_name FROM jmx.current."com.facebook.presto.memory:*type=memorypool*"; freebytes | node | object_name -----------+--------------+---------------------------------------------------------- 322122547 | 4768d52a6258 | com.facebook.presto.memory:type=MemoryPool,name=reserved 429496730 | 4768d52a6258 | com.facebook.presto.memory:type=MemoryPool,name=general (2 rows)
More info on monitoring can be found in the docs – check out this blog for details on memory-related monitoring, management and memory pools: https://prestodb.io/blog/2019/08/19/memory-tracking
3. REST API
You can make a simple REST call to Presto via the Presto REST API to get a dump of recently run queries using:
The response is in JSON format.
You can optionally specify a query ID – in this example my query ID is 20200926_204458_00000_68x9u:
4. Third-Party Tools
You can also monitor Presto metrics using third-party tools like Datadog, Prometheus, etc.
The above is not an exhaustive list, but I hope it helps you find different Presto metrics for monitoring.
The Open Data Lake in the cloud is the solution to the massive data problem. Many companies are adopting that architecture because of better price-performance, scale, and non-proprietary architecture.
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. Check out this article for more information about data warehouses.