Search code examples
google-cloud-platformgoogle-cloud-dataproc

How can I run Presto on Google Cloud Dataproc?


I want to run Presto on a Dataproc instance or on Google Cloud Platform in general. How can I easily setup and install Presto, especially with Hive?


Solution

  • You can use an initialization action with a Cloud Dataproc cluster to quickly install and configure Presto. Specifically, there is a GitHub repository with initialization actions. There is a Presto initialization action which lets you quickly install and configure Presto.

    If you want to use the Presto WebUI, once the cluster is online you can follow these directions to create an SSH tunnel and SOCKS proxy to the cluster. From there, you can access Presto (by default, unless you change it) on port 8080 on the master node.