Tutorial 103: Prestodb cluster on GCP

Introduction

This tutorial is Part III of our Getting started with PrestoDB series. As a reminder, Prestodb is an open source distributed SQL query engine. In tutorial 102 we covered how to run a three node prestodb cluster on a laptop. In this tutorial, we’ll show you how to run a prestodb cluster in a GCP environment using VM instances and GKE containers.

Environment

This guide was developed on GCP VM instances and GKE containers.

Presto on GCP with VMs

Implementation steps for prestodb on vm instances

Step1: Create a GCP VM instance using the CREATE INSTANCE tab, name it as presto-coordinator. Next, create three more VM instances as presto-worker1, presto-worker2 and presto-worker3 respectively.

Step 2: By default GCP blocks all network ports, so prestodb will need ports 8080-8083 enabled. Use the firewalls rule tab and enable them.

Step 3: 

Install JAVA and python.

Step 4:

Download the Presto server tarball, presto-server-0.253.1.tar.gz and unpack it. The tarball will contain a single top-level directory, presto-server-0.253.1 which we will call the installation directory.

Run the commands below to install the official tarballs for presto-server and presto-cli from prestodb.io

user@presto-coordinator-1:~$ curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.235.1/presto-server-0.235.1.tar.gz
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100  721M  100  721M    0     0   245M      0  0:00:02  0:00:02 --:--:--  245M
user@presto-coordinator-1:~$ curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.235.1/presto-cli-0.235.1-executable.jar
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100 12.7M  100 12.7M    0     0  15.2M      0 --:--:-- --:--:-- --:--:-- 15.1M
user@presto-coordinator-1:~$

Step 5:

Use gunzip and tar to unzip and untar the presto-server

user@presto-coordinator-1:~$gunzip presto-server-0.235.1.tar.gz ;tar -xf presto-server-0.235.1.tar

Step 6: (optional)

Rename the directory without version number

user@presto-coordinator-1:~$ mv presto-server-0.235.1 presto-server

Step 7:  

Create etc, etc/catalog and data directories

user@presto-coordinator-1:~/presto-server$ mkdir etc etc/catalog data

Step 8:

Define etc/node.config, etc/config.properties, etc/jvm.config and etc/catalog/jmx.properties files as below for presto co-ordinator server.  

user@presto-coordinator-1:~/presto-server$ cat etc/node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/home/user/presto-server/data

user@presto-coordinator-1:~/presto-server$ cat etc/config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080

user@presto-coordinator-1:~/presto-server$ cat etc/jvm.config
-server-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true

user@presto-coordinator-1:~/presto-server$ cat etc/log.properties
com.facebook.presto=INFO

user@presto-coordinator-1:~/presto-server$ cat etc/catalog/jmx.properties
connector.name=jmx

Step: 9 

Check the cluster UI status. It should  show the Active worker count at 0 since we enabled only the coordinator.

Step 10: 

Repeat steps 1 to 8 on the remaining 3 vm instances which will act as worker nodes.

On the configuration step for worker nodes, set coordinator to false and http-server.http.port to 8081, 8082 and 8083 for worker1, worker2 and worker3 respectively.

Also make sure node.id and http-server.http.port are different for each worker node.

user@presto-worker1:~/presto-server$ cat etc/node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffffd
node.data-dir=/home/user/presto-server/data
user@presto-worker1:~/presto-server$ cat etc/config.properties
coordinator=false
http-server.http.port=8083
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery.uri=http://presto-coordinator-1:8080

user@presto-worker1:~/presto-server$ cat etc/jvm.config
-server-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true

user@presto-worker1:~/presto-server$ cat etc/log.properties
com.facebook.presto=INFO

user@presto-worker1:~/presto-server$ cat etc/catalog/jmx.properties
connector.name=jmx

Step 11: 

Check cluster status, it should reflect the three worker nodes as part of the prestodb cluster.

Step 12:

Verify the prestodb environment by running the prestodb CLI with simple JMX query

user@presto-coordinator-1:~/presto-server$ ./presto-cli
presto> SHOW TABLES FROM jmx.current;
                                                              Table                                                              
-----------------------------------------------------------------------------------------------------------------------------------
com.facebook.airlift.discovery.client:name=announcer                                                                             
com.facebook.airlift.discovery.client:name=serviceinventory                                                                      
com.facebook.airlift.discovery.store:name=dynamic,type=distributedstore                                                          
com.facebook.airlift.discovery.store:name=dynamic,type=httpremotestore                                                           
com.facebook.airlift.discovery.store:name=dynamic,type=replicator


Implementation steps for Prestodb on GKE containers

Step 1:

Go to the Google cloud Console and activate the cloud console window

Step 2:

Create an artifacts repository using the below command and replace REGION with the valid region you would prefer to create the repository.

gcloud artifacts repositories create ahana-prestodb \
   --repository-format=docker \
   --location=REGION \
   --description="Docker repository

Step 3:

Create the container cluster by using the gcloud command: 

user@cloudshell:~ (weighty-list-324021)$ gcloud config set compute/zone us-central1-c
Updated property [compute/zone].

user@cloudshell:~ (weighty-list-324021)$ gcloud container clusters create prestodb-cluster01

Creating cluster prestodb-cluster01 in us-central1-c…done.
Created 
.
.
.

kubeconfig entry generated for prestodb-cluster01.
NAME                LOCATION       MASTER_VERSION   MASTER_IP     MACHINE_TYPE  NODE_VERSION     NUM_NODES  STATUS
prestodb-cluster01  us-central1-c  1.20.8-gke.2100  34.72.76.205  e2-medium     1.20.8-gke.2100  3          RUNNING
user@cloudshell:~ (weighty-list-324021)$

Step 4:

After container cluster creation, run the following command to see the cluster’s three nodes

user@cloudshell:~ (weighty-list-324021)$ kubectl get nodes
NAME                                                STATUS   ROLES    AGE     VERSION
gke-prestodb-cluster01-default-pool-34d21367-25cw   Ready    <none>   7m54s   v1.20.8-gke.2100
gke-prestodb-cluster01-default-pool-34d21367-7w90   Ready    <none>   7m54s   v1.20.8-gke.2100
gke-prestodb-cluster01-default-pool-34d21367-mwrn   Ready    <none>   7m53s   v1.20.8-gke.2100
user@cloudshell:~ (weighty-list-324021)$

Step 5:

Pull the prestodb docker image 

user@cloudshell:~ (weighty-list-324021)$ docker pull ahanaio/prestodb-sandbox

Step 6:

Deploy ahanaio/prestodb-sandbox locally on the shell and create an image named as coordinator which will later be deployed on the container clusters.

user@cloudshell:~ (weighty-list-324021)$ docker run -d -p 8080:8080 -it –name coordinator ahanaio/prestodb-sandbox
391aa2201e4602105f319a2be7d34f98ed4a562467e83231913897a14c873fd0

Step 7:

Edit the etc/config.parameters file inside the container and set the node-scheduler.include-coordinator property to false. Now restart the coordinator.

user@cloudshell:~ (weighty-list-324021)$ docker exec -i -t coordinator bash                                                                                                                       
bash-4.2# vi etc/config.properties
bash-4.2# cat etc/config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
discovery-server.enabled=true
discovery.uri=http://localhost:8080
bash-4.2# exit
exit
user@cloudshell:~ (weighty-list-324021)$ docker restart coordinator
coordinator

Step 8:

Now do a docker commit, create a tag called coordinator based on imageid, this will create a new local image called coordinator.

user@cloudshell:~ (weighty-list-324021)$ docker commit coordinator
Sha256:46ab5129fe8a430f7c6f42e43db5e56ccdf775b48df9228440ba2a0b9a68174c

user@cloudshell:~ (weighty-list-324021)$ docker images
REPOSITORY                 TAG       IMAGE ID       CREATED          SIZE
<none>                     <none>    46ab5129fe8a   15 seconds ago   1.81GB
ahanaio/prestodb-sandbox   latest    76919cf0f33a   34 hours ago     1.81GB

user @cloudshell:~ (weighty-list-324021)$ docker tag 46ab5129fe8a coordinator

user@cloudshell:~ (weighty-list-324021)$ docker images
REPOSITORY                 TAG       IMAGE ID       CREATED              SIZE
coordinator                latest    46ab5129fe8a   About a minute ago   1.81GB
ahanaio/prestodb-sandbox   latest    76919cf0f33a   34 hours ago         1.81GB

Step 9:

Create tag with artifacts path and copy it over to artifacts location

user@cloudshell:~ docker tag coordinator:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/coord:v1

user@cloudshell:~ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/coord:v1

Step 10:

Deploy the coordinator into the cloud container using the below kubectl commands.

user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment coordinator –image=coordinator
deployment.apps/coordinator created

user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment coordinator –name=presto-coordinator –type=LoadBalancer –port 8080 –target-port 8080
service/presto-coordinator exposed

user@cloudshell:~ (weighty-list-324021)$ kubectl get service
NAME                 TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)          AGE
kubernetes           ClusterIP      10.7.240.1    <none>          443/TCP          41m
presto-coordinator   LoadBalancer   10.7.248.10   35.239.88.127   8080:30096/TCP   92s

Step 11:

Copy the external IP on a browser and check the status

Step 12:

Now to deploy worker1 into the GKE container, again start a local instance named worker1 using the docker run command.

user@cloudshell:~ docker run -d -p 8080:8080 -it –name worker1 coordinator
1d30cf4094eba477ab40d84ae64729e14de992ac1fa1e5a66e35ae553964b44b
user@cloudshell:~

Step 13:

Edit worker1 config.properties inside the worker1 container to set coordinator to false and http-server.http.port to 8081. Also the discovery.uri should point to the coordinator container running inside the GKE container.

user@cloudshell:~ (weighty-list-324021)$ docker exec -it worker1  bash                                                                                                                             
bash-4.2# vi etc/config.properties
bash-4.2# vi etc/config.properties
bash-4.2# cat etc/config.properties
coordinator=false
http-server.http.port=8081
discovery.uri=http://presto-coordinator01:8080

Step 14:

Stop the local worker1 container, commit the worker1 as image and tag it as worker1 image

user@cloudshell:~ (weighty-list-324021)$ docker stop worker1
worker1
user@cloudshell:~ (weighty-list-324021)$ docker commit worker1
sha256:cf62091eb03702af9bc05860dc2c58644fce49ceb6a929eb6c558cfe3e7d9abf
ram@cloudshell:~ (weighty-list-324021)$ docker images
REPOSITORY                                                            TAG       IMAGE ID       CREATED         SIZE
<none>                                                                <none>    cf62091eb037   6 seconds ago   1.81GB

user@cloudshell:~ (weighty-list-324021)$ docker tag cf62091eb037 worker1:latest
user@cloudshell:~ (weighty-list-324021)$ docker images
REPOSITORY                                                            TAG       IMAGE ID       CREATED         SIZE
worker1                                                               latest    cf62091eb037   2 minutes ago   1.81GB

Step 15:

Push the worker1 image into google artifacts location

user@cloudshell:~ (weighty-list-324021)$ docker tag worker1:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1:v1

user@cloudshell:~ (weighty-list-324021)$ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1:v1
The push refers to repository [us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1]
b12c3306c4a9: Pushed
.
.
coordinator=false
v1: digest: sha256:fe7db4aa7c9ee04634e079667828577ec4d2681d5ac0febef3ab60984eaff3e0 size: 2201

Step 16:

Deploy and expose the worker1 from the artifacts location into the google cloud container using this kubectl command.

user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment presto-worker01  –image=us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1:v1                               
deployment.apps/presto-worker01 created

user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment presto-worker01 –name=presto-worker01 –type=LoadBalancer –port 8081 –target-port 8081                                       
service/presto-worker01 exposed

Step 17:

Check presto UI for successful deployment of worker1

Step 18:

Repeat steps 12 to steps 17 to deploy worker2 inside GKE container:

  • deploy ahana local instance using docker and name it as worker2, 
  • then edit the etc/config.properties inside the worker2 container to set coordinator to false, port to 8082 and discover.uri to the coordinator container name.
  • shut the instance then commit that instance and create docker image as worker2 
  • push that worker2 image to google artifacts location 
  • use kubectl commands to deploy and expose the worker2 instance inside a google container. Now check the prestodb UI for the second worker being active.
  • Check prestodb UI for successful deployment of worker2
user@cloudshell:~ (weighty-list-324021)$ docker run -d -p 8080:8080 -it –name worker2 worker1                                                                                                     
32ace8d22688901c9fa7b406fe94dc409eaf3abfd97229ab3df69ffaac00185d
user@cloudshell:~ (weighty-list-324021)$ docker exec -it worker2 bash
bash-4.2# vi etc/config.properties
bash-4.2# cat etc/config.properties
coordinator=false
http-server.http.port=8082
discovery.uri=http://presto-coordinator01:8080
bash-4.2# exit
exit
user@cloudshell:~ (weighty-list-324021)$ docker commit worker2
sha256:08c0322959537c74f91a6ccbdf78d0876f66df21872ff7b82217693dc3d4ca1e
user@cloudshell:~ (weighty-list-324021)$ docker images
REPOSITORY                                                              TAG       IMAGE ID       CREATED          SIZE
<none>                                                                  <none>    08c032295953   11 seconds ago   1.81GB

user@cloudshell:~ (weighty-list-324021)$ docker tag 08c032295953 worker2:latest

user@cloudshell:~ (weighty-list-324021)$ docker commit worker2
Sha256:b1272b5e824fdebcfd7d434fab7580bb8660cbe29aec8912c24d3e900fa5da11

user@cloudshell:~ (weighty-list-324021)$ docker tag worker2:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2:v1

user@cloudshell:~ (weighty-list-324021)$ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2:v1
The push refers to repository [us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2]
aae10636ecc3: Pushed
.
.
v1: digest: sha256:103c3fb05004d2ae46e9f6feee87644cb681a23e7cb1cbcf067616fb1c50cf9e size: 2410

user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment presto-worker02  –image=us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2:v1
deployment.apps/presto-worker02 created

user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment presto-worker02 –name=presto-worker02 –type=LoadBalancer –port 8082 –target-port 8082
service/presto-worker02 exposed

user@cloudshell:~ (weighty-list-324021)$ kubectl get service
NAME                   TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)          AGE
kubernetes             ClusterIP      10.7.240.1     <none>           443/TCP          3h35m
presto-coordinator01   LoadBalancer   10.7.241.37    130.211.208.47   8080:32413/TCP   49m
presto-worker01        LoadBalancer   10.7.255.27    34.132.29.202    8081:31224/TCP   9m15s
presto-worker02        LoadBalancer   10.7.254.137   35.239.88.127    8082:31020/TCP   39s

Steps 19:

Repeat steps 12 to steps 18 to provision worker3 inside the google cloud container

user@cloudshell:~ (weighty-list-324021)$ docker run -d -p 8080:8080 -it –name worker3 worker1
6d78e9db0c72f2a112049a677d426b7fa8640e8c1d3aa408a17321bb9353c545

user@cloudshell:~ (weighty-list-324021)$ docker exec -it worker3 bash                                                                                                                              
bash-4.2# vi etc/config.properties
bash-4.2# cat etc/config.properties
coordinator=false
http-server.http.port=8083
discovery.uri=http://presto-coordinator01:8080
bash-4.2# exit
Exit

user@cloudshell:~ (weighty-list-324021)$ docker commit worker3
sha256:689f39b35b03426efde0d53c16909083a2649c7722db3dabb57ff0c854334c06
user@cloudshell:~ (weighty-list-324021)$ docker images
REPOSITORY                                                              TAG       IMAGE ID       CREATED          SIZE
<none>                                                                  <none>    689f39b35b03   25 seconds ago   1.81GB
ahanaio/prestodb-sandbox                                                latest    76919cf0f33a   37 hours ago     1.81GB

user@cloudshell:~ (weighty-list-324021)$ docker tag 689f39b35b03 worker3:latest

user@cloudshell:~ (weighty-list-324021)$ docker tag worker3:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3:v1

user@cloudshell:~ (weighty-list-324021)$ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3:v1
The push refers to repository [us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3]
b887f13ace4e: Pushed
.
.
v1: digest: sha256:056a379b00b0d43a0a5877ccf49f690d5f945c0512ca51e61222bd537336491b size: 2410

user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment presto-worker03  –image=us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3:v1
deployment.apps/presto-worker03 created

user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment presto-worker02 –name=presto-worker03 –type=LoadBalancer –port 8083 –target-port 8083
service/presto-worker03 exposed



Step 20:

Verify the prestodb environment by running the prestodb CLI with simple JMX query

user@presto-coordinator-1:~/presto-server$ ./presto-cli
presto> SHOW TABLES FROM jmx.current;
                                                              Table                                                              
———————————————————————————————————————————–
com.facebook.airlift.discovery.client:name=announcer                                                                             
com.facebook.airlift.discovery.client:name=serviceinventory                                                                      
com.facebook.airlift.discovery.store:name=dynamic,type=distributedstore                                                          
com.facebook.airlift.discovery.store:name=dynamic,type=httpremotestore                                                           
com.facebook.airlift.discovery.store:name=dynamic,type=replicator

Summary

In this tutorial you learned how to  provision and run prestodb inside Google VM instances and on GKE containers. Now you should be able to validate the functional aspects of prestodb. 

If you want to run production Presto workloads at scale and performance, check out https://www.ahana.io which provides a managed service for Presto.