Introduction
This tutorial is Part III of our Getting started with PrestoDB series. As a reminder, Prestodb is an open source distributed SQL query engine. In tutorial 102 we covered how to run a three node prestodb cluster on a laptop. In this tutorial, we’ll show you how to run a prestodb cluster in a GCP environment using VM instances and GKE containers.
Environment
This guide was developed on GCP VM instances and GKE containers.
Presto on GCP with VMs
Implementation steps for prestodb on vm instances
Step1: Create a GCP VM instance using the CREATE INSTANCE tab, name it as presto-coordinator. Next, create three more VM instances as presto-worker1, presto-worker2 and presto-worker3 respectively.
Step 2: By default GCP blocks all network ports, so prestodb will need ports 8080-8083 enabled. Use the firewalls rule tab and enable them.
Step 3:
Install JAVA and python.
Step 4:
Download the Presto server tarball, presto-server-0.253.1.tar.gz
and unpack it. The tarball will contain a single top-level directory, presto-server-0.253.1
which we will call the installation directory.
Run the commands below to install the official tarballs for presto-server and presto-cli from prestodb.io
user@presto-coordinator-1:~$ curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.235.1/presto-server-0.235.1.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 721M 100 721M 0 0 245M 0 0:00:02 0:00:02 --:--:-- 245M
user@presto-coordinator-1:~$ curl -O https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.235.1/presto-cli-0.235.1-executable.jar
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 12.7M 100 12.7M 0 0 15.2M 0 --:--:-- --:--:-- --:--:-- 15.1M
user@presto-coordinator-1:~$
Step 5:
Use gunzip and tar to unzip and untar the presto-server
user@presto-coordinator-1:~$gunzip presto-server-0.235.1.tar.gz ;tar -xf presto-server-0.235.1.tar |
Step 6: (optional)
Rename the directory without version number
user@presto-coordinator-1:~$ mv presto-server-0.235.1 presto-server |
Step 7:
Create etc, etc/catalog and data directories
user@presto-coordinator-1:~/presto-server$ mkdir etc etc/catalog data |
Step 8:
Define etc/node.config, etc/config.properties, etc/jvm.config and etc/catalog/jmx.properties files as below for presto co-ordinator server.
user@presto-coordinator-1:~/presto-server$ cat etc/node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffff
node.data-dir=/home/user/presto-server/data
user@presto-coordinator-1:~/presto-server$ cat etc/config.properties
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
user@presto-coordinator-1:~/presto-server$ cat etc/jvm.config
-server-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
user@presto-coordinator-1:~/presto-server$ cat etc/log.properties
com.facebook.presto=INFO
user@presto-coordinator-1:~/presto-server$ cat etc/catalog/jmx.properties
connector.name=jmx
Step: 9
Check the cluster UI status. It should show the Active worker count at 0 since we enabled only the coordinator.
Step 10:
Repeat steps 1 to 8 on the remaining 3 vm instances which will act as worker nodes.
On the configuration step for worker nodes, set coordinator to false and http-server.http.port to 8081, 8082 and 8083 for worker1, worker2 and worker3 respectively.
Also make sure node.id and http-server.http.port are different for each worker node.
user@presto-worker1:~/presto-server$ cat etc/node.properties
node.environment=production
node.id=ffffffff-ffff-ffff-ffff-ffffffffffffd
node.data-dir=/home/user/presto-server/data
user@presto-worker1:~/presto-server$ cat etc/config.properties
coordinator=false
http-server.http.port=8083
query.max-memory=50GB
query.max-memory-per-node=1GB
query.max-total-memory-per-node=2GB
discovery.uri=http://presto-coordinator-1:8080
user@presto-worker1:~/presto-server$ cat etc/jvm.config
-server-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError
-Djdk.attach.allowAttachSelf=true
user@presto-worker1:~/presto-server$ cat etc/log.properties
com.facebook.presto=INFO
user@presto-worker1:~/presto-server$ cat etc/catalog/jmx.properties
connector.name=jmx
Step 11:
Check cluster status, it should reflect the three worker nodes as part of the prestodb cluster.
Step 12:
Verify the prestodb environment by running the prestodb CLI with simple JMX query
user@presto-coordinator-1:~/presto-server$ ./presto-cli |
Implementation steps for Prestodb on GKE containers
Step 1:
Go to the Google cloud Console and activate the cloud console window
Step 2:
Create an artifacts repository using the below command and replace REGION with the valid region you would prefer to create the repository.
gcloud artifacts repositories create ahana-prestodb \ |
Step 3:
Create the container cluster by using the gcloud command:
user@cloudshell:~ (weighty-list-324021)$ gcloud config set compute/zone us-central1-c Updated property [compute/zone]. user@cloudshell:~ (weighty-list-324021)$ gcloud container clusters create prestodb-cluster01 Creating cluster prestodb-cluster01 in us-central1-c…done. Created . . . kubeconfig entry generated for prestodb-cluster01. NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS prestodb-cluster01 us-central1-c 1.20.8-gke.2100 34.72.76.205 e2-medium 1.20.8-gke.2100 3 RUNNING user@cloudshell:~ (weighty-list-324021)$ |
Step 4:
After container cluster creation, run the following command to see the cluster’s three nodes
user@cloudshell:~ (weighty-list-324021)$ kubectl get nodes NAME STATUS ROLES AGE VERSION gke-prestodb-cluster01-default-pool-34d21367-25cw Ready <none> 7m54s v1.20.8-gke.2100 gke-prestodb-cluster01-default-pool-34d21367-7w90 Ready <none> 7m54s v1.20.8-gke.2100 gke-prestodb-cluster01-default-pool-34d21367-mwrn Ready <none> 7m53s v1.20.8-gke.2100 user@cloudshell:~ (weighty-list-324021)$ |
Step 5:
Pull the prestodb docker image
user@cloudshell:~ (weighty-list-324021)$ docker pull ahanaio/prestodb-sandbox |
Step 6:
Deploy ahanaio/prestodb-sandbox locally on the shell and create an image named as coordinator which will later be deployed on the container clusters.
user@cloudshell:~ (weighty-list-324021)$ docker run -d -p 8080:8080 -it –name coordinator ahanaio/prestodb-sandbox 391aa2201e4602105f319a2be7d34f98ed4a562467e83231913897a14c873fd0 |
Step 7:
Edit the etc/config.parameters file inside the container and set the node-scheduler.include-coordinator property to false. Now restart the coordinator.
user@cloudshell:~ (weighty-list-324021)$ docker exec -i -t coordinator bash bash-4.2# vi etc/config.properties bash-4.2# cat etc/config.properties coordinator=true node-scheduler.include-coordinator=false http-server.http.port=8080 discovery-server.enabled=true discovery.uri=http://localhost:8080 bash-4.2# exit exit user@cloudshell:~ (weighty-list-324021)$ docker restart coordinator coordinator |
Step 8:
Now do a docker commit, create a tag called coordinator based on imageid, this will create a new local image called coordinator.
user@cloudshell:~ (weighty-list-324021)$ docker commit coordinator Sha256:46ab5129fe8a430f7c6f42e43db5e56ccdf775b48df9228440ba2a0b9a68174c user@cloudshell:~ (weighty-list-324021)$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE <none> <none> 46ab5129fe8a 15 seconds ago 1.81GB ahanaio/prestodb-sandbox latest 76919cf0f33a 34 hours ago 1.81GB user @cloudshell:~ (weighty-list-324021)$ docker tag 46ab5129fe8a coordinator user@cloudshell:~ (weighty-list-324021)$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE coordinator latest 46ab5129fe8a About a minute ago 1.81GB ahanaio/prestodb-sandbox latest 76919cf0f33a 34 hours ago 1.81GB |
Step 9:
Create tag with artifacts path and copy it over to artifacts location
user@cloudshell:~ docker tag coordinator:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/coord:v1 user@cloudshell:~ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/coord:v1 |
Step 10:
Deploy the coordinator into the cloud container using the below kubectl commands.
user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment coordinator –image=coordinator deployment.apps/coordinator created user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment coordinator –name=presto-coordinator –type=LoadBalancer –port 8080 –target-port 8080 service/presto-coordinator exposed user@cloudshell:~ (weighty-list-324021)$ kubectl get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.7.240.1 <none> 443/TCP 41m presto-coordinator LoadBalancer 10.7.248.10 35.239.88.127 8080:30096/TCP 92s |
Step 11:
Copy the external IP on a browser and check the status
Step 12:
Now to deploy worker1 into the GKE container, again start a local instance named worker1 using the docker run command.
user@cloudshell:~ docker run -d -p 8080:8080 -it –name worker1 coordinator 1d30cf4094eba477ab40d84ae64729e14de992ac1fa1e5a66e35ae553964b44b user@cloudshell:~ |
Step 13:
Edit worker1 config.properties inside the worker1 container to set coordinator to false and http-server.http.port to 8081. Also the discovery.uri should point to the coordinator container running inside the GKE container.
user@cloudshell:~ (weighty-list-324021)$ docker exec -it worker1 bash bash-4.2# vi etc/config.properties bash-4.2# vi etc/config.properties bash-4.2# cat etc/config.properties coordinator=false http-server.http.port=8081 discovery.uri=http://presto-coordinator01:8080 |
Step 14:
Stop the local worker1 container, commit the worker1 as image and tag it as worker1 image
user@cloudshell:~ (weighty-list-324021)$ docker stop worker1 worker1 user@cloudshell:~ (weighty-list-324021)$ docker commit worker1 sha256:cf62091eb03702af9bc05860dc2c58644fce49ceb6a929eb6c558cfe3e7d9abf ram@cloudshell:~ (weighty-list-324021)$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE <none> <none> cf62091eb037 6 seconds ago 1.81GB user@cloudshell:~ (weighty-list-324021)$ docker tag cf62091eb037 worker1:latest user@cloudshell:~ (weighty-list-324021)$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE worker1 latest cf62091eb037 2 minutes ago 1.81GB |
Step 15:
Push the worker1 image into google artifacts location
user@cloudshell:~ (weighty-list-324021)$ docker tag worker1:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1:v1 user@cloudshell:~ (weighty-list-324021)$ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1:v1 The push refers to repository [us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1] b12c3306c4a9: Pushed . . coordinator=false v1: digest: sha256:fe7db4aa7c9ee04634e079667828577ec4d2681d5ac0febef3ab60984eaff3e0 size: 2201 |
Step 16:
Deploy and expose the worker1 from the artifacts location into the google cloud container using this kubectl command.
user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment presto-worker01 –image=us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker1:v1 deployment.apps/presto-worker01 created user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment presto-worker01 –name=presto-worker01 –type=LoadBalancer –port 8081 –target-port 8081 service/presto-worker01 exposed |
Step 17:
Check presto UI for successful deployment of worker1
Step 18:
Repeat steps 12 to steps 17 to deploy worker2 inside GKE container:
- deploy ahana local instance using docker and name it as worker2,
- then edit the etc/config.properties inside the worker2 container to set coordinator to false, port to 8082 and discover.uri to the coordinator container name.
- shut the instance then commit that instance and create docker image as worker2
- push that worker2 image to google artifacts location
- use kubectl commands to deploy and expose the worker2 instance inside a google container. Now check the prestodb UI for the second worker being active.
- Check prestodb UI for successful deployment of worker2
user@cloudshell:~ (weighty-list-324021)$ docker run -d -p 8080:8080 -it –name worker2 worker1 32ace8d22688901c9fa7b406fe94dc409eaf3abfd97229ab3df69ffaac00185d user@cloudshell:~ (weighty-list-324021)$ docker exec -it worker2 bash bash-4.2# vi etc/config.properties bash-4.2# cat etc/config.properties coordinator=false http-server.http.port=8082 discovery.uri=http://presto-coordinator01:8080 bash-4.2# exit exit user@cloudshell:~ (weighty-list-324021)$ docker commit worker2 sha256:08c0322959537c74f91a6ccbdf78d0876f66df21872ff7b82217693dc3d4ca1e user@cloudshell:~ (weighty-list-324021)$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE <none> <none> 08c032295953 11 seconds ago 1.81GB user@cloudshell:~ (weighty-list-324021)$ docker tag 08c032295953 worker2:latest user@cloudshell:~ (weighty-list-324021)$ docker commit worker2 Sha256:b1272b5e824fdebcfd7d434fab7580bb8660cbe29aec8912c24d3e900fa5da11 user@cloudshell:~ (weighty-list-324021)$ docker tag worker2:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2:v1 user@cloudshell:~ (weighty-list-324021)$ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2:v1 The push refers to repository [us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2] aae10636ecc3: Pushed . . v1: digest: sha256:103c3fb05004d2ae46e9f6feee87644cb681a23e7cb1cbcf067616fb1c50cf9e size: 2410 user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment presto-worker02 –image=us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker2:v1 deployment.apps/presto-worker02 created user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment presto-worker02 –name=presto-worker02 –type=LoadBalancer –port 8082 –target-port 8082 service/presto-worker02 exposed user@cloudshell:~ (weighty-list-324021)$ kubectl get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.7.240.1 <none> 443/TCP 3h35m presto-coordinator01 LoadBalancer 10.7.241.37 130.211.208.47 8080:32413/TCP 49m presto-worker01 LoadBalancer 10.7.255.27 34.132.29.202 8081:31224/TCP 9m15s presto-worker02 LoadBalancer 10.7.254.137 35.239.88.127 8082:31020/TCP 39s |
Steps 19:
Repeat steps 12 to steps 18 to provision worker3 inside the google cloud container
user@cloudshell:~ (weighty-list-324021)$ docker run -d -p 8080:8080 -it –name worker3 worker1 6d78e9db0c72f2a112049a677d426b7fa8640e8c1d3aa408a17321bb9353c545 user@cloudshell:~ (weighty-list-324021)$ docker exec -it worker3 bash bash-4.2# vi etc/config.properties bash-4.2# cat etc/config.properties coordinator=false http-server.http.port=8083 discovery.uri=http://presto-coordinator01:8080 bash-4.2# exit Exit user@cloudshell:~ (weighty-list-324021)$ docker commit worker3 sha256:689f39b35b03426efde0d53c16909083a2649c7722db3dabb57ff0c854334c06 user@cloudshell:~ (weighty-list-324021)$ docker images REPOSITORY TAG IMAGE ID CREATED SIZE <none> <none> 689f39b35b03 25 seconds ago 1.81GB ahanaio/prestodb-sandbox latest 76919cf0f33a 37 hours ago 1.81GB user@cloudshell:~ (weighty-list-324021)$ docker tag 689f39b35b03 worker3:latest user@cloudshell:~ (weighty-list-324021)$ docker tag worker3:latest us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3:v1 user@cloudshell:~ (weighty-list-324021)$ docker push us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3:v1 The push refers to repository [us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3] b887f13ace4e: Pushed . . v1: digest: sha256:056a379b00b0d43a0a5877ccf49f690d5f945c0512ca51e61222bd537336491b size: 2410 user@cloudshell:~ (weighty-list-324021)$ kubectl create deployment presto-worker03 –image=us-central1-docker.pkg.dev/weighty-list-324021/prestodb-ahana/worker3:v1 deployment.apps/presto-worker03 created user@cloudshell:~ (weighty-list-324021)$ kubectl expose deployment presto-worker02 –name=presto-worker03 –type=LoadBalancer –port 8083 –target-port 8083 service/presto-worker03 exposed
|
Step 20:
Verify the prestodb environment by running the prestodb CLI with simple JMX query
user@presto-coordinator-1:~/presto-server$ ./presto-cli presto> SHOW TABLES FROM jmx.current; Table ———————————————————————————————————————————– com.facebook.airlift.discovery.client:name=announcer com.facebook.airlift.discovery.client:name=serviceinventory com.facebook.airlift.discovery.store:name=dynamic,type=distributedstore com.facebook.airlift.discovery.store:name=dynamic,type=httpremotestore com.facebook.airlift.discovery.store:name=dynamic,type=replicator |
Summary
In this tutorial you learned how to provision and run prestodb inside Google VM instances and on GKE containers. Now you should be able to validate the functional aspects of prestodb.
If you want to run production Presto workloads at scale and performance, check out https://www.ahana.io which provides a managed service for Presto.