Search code examples
prestotrino

presto + how to manage presto servers stop/start/status action


we installed the follwing presto cluster on Linux redhat 7.2 version

presto latest version - 0.216

1 presto coordinator

231 presto workers

on each worker machine we can use the follwing command in order to verify the status

/app/presto/presto-server-0.216/bin/launcher status
Running as 61824

and also stop/start as the follwing

/app/presto/presto-server-0.216/bin/launcher stop

/app/presto/presto-server-0.216/bin/launcher start

I also searches in google about UI that can manage the presto status/stop/start but not seen any thing about this

its very strange that presto not comes with some user interface that can show the cluster status and do stop/start action if we need to do so

as all know the only user interface of presto is show status and not have the actions as stop/start

enter image description here

in the above example screen we can see that the active presto worker are only 5 from 231 , but this UI not support stop/start actions and not show on which worker presto isn't active

so what we can do about it?

its very bad idea to access each worker machine and see if presto is up or down

why presto not have centralized UI that can do stop/start action ?

enter image description here

example what we are expecting from the UI , - partial list

enter image description here

. . .


Solution

  • In my opinion and experience managing prestosql cluster, it matters of service discovery in architecture patterns.

    So far, it uses following patterns in the open source release of prestodb/prestosql:

    1. server-side service discovery - it means a client app like presto cli or any app uses presto sdk just need to reach a coordinator w/o awareness of worker nodes.
    2. service registry - a place to keep tracking available instances.
    3. self-registration - A service instance is responsible for registering itself with the service registry. This is the key part that it forces several behaviors:
    1. Service instances must be registered with the service registry on startup and unregistered on shutdown
    2. Service instances that crash must be unregistered from the service registry
    3. Service instances that are running but incapable of handling requests must be unregistered from the service registry

    So it keeps the life-cycle management of each presto worker to each instance itself.

    so what we can do about it?

    It provides some observability from presto cluster itself like HTTP API /v1/node and /v1/service/presto to see instance status. Personally I recommend using another cluster manager like k8s or nomad to manage presto cluster members.

    its very bad idea to access each worker machine and see if presto is up or down why presto not have centralized UI that can do stop/start action ?

    No opinion on good/bad. Take k8s for example, you can manage all presto workers as one k8s deployment and manage each presto worker in one pod. It can use Liveness, Readiness and Startup Probes to automate the instance lifecycle with a few YAML code. E.g., the design of livenessProbe of helm chart stable/presto. And cluster manageer like k8s does provide web UI so that you can touch resources to act like an admin. . Or you can choose to write more Java code to extend Presto.