we want to build presto production cluster on rhel machines
one of the machine is the presto coordinator , and all the others are presto workers
what is the suggestion of minimal presto workers for production env?
some more details about presto:
The Presto coordinator is the server that is responsible for parsing statements, planning queries, and managing Presto worker nodes. It is the “brain” of a Presto installation and is also the node to which a client connects to submit statements for execution. Every Presto installation must have a Presto coordinator alongside one or more Presto workers. For development or testing purposes, a single instance of Presto can be configured to perform both roles.
The coordinator keeps track of the activity on each worker and coordinates the execution of a query. The coordinator creates a logical model of a query involving a series of stages which is then translated into a series of connected tasks running on a cluster of Presto workers.
Coordinators communicate with workers and clients using a REST API.
Worker A Presto worker is a server in a Presto installation which is responsible for executing tasks and processing data. Worker nodes fetch data from connectors and exchange intermediate data with each other. The coordinator is responsible for fetching results from the workers and returning the final results to the client.
When a Presto worker process starts up, it advertises itself to the discovery server in the coordinator, which makes it available to the Presto coordinator for task execution.
Workers communicate with other workers and Presto coordinators using a REST API.
Minimal number of Presto Workers is 1
independently on your environment type.
You may configure your Presto Coordinator node to run a worker too and get a minimal single-node setup to evaluate the features for example. In accordance with official guide you can do it by specifying the following parameters in config.properties
:
coordinator=true
node-scheduler.include-coordinator=true
Minimal reasonable production amount of workers is unlikely possible to determine without additional information like number of users expected, the number and size of datasets, your infrastructure performance, etc...