I am using Terraform to create a dataproc cluster that uses a GCP cloudsql instance as the hivemetastore, the terrafrm project creates the cluster and all its prerequisites (network, service account, cloudsql instance & user, etc).
cloud-sql-proxy.sh
is provided to assist with this however I can't get it to work, when the cluster is created cloud-sql-proxy.sh
fails with error:
nc: connect to localhost port 3306 (tcp) failed: Connection refused
I've banged my head against the wall trying to figure out why but can't get to the bottom of it so am hoping someone here can help.
I've hosted the terraform project at https://github.com/jamiekt/democratising-dataproc. Reproducing the problem is very easy, follow these steps:
gcloud
if you haven't alreadygcloud auth application-default login #creates a file containing credentials that terraform will use
git clone git@github.com:jamiekt/democratising-dataproc.git && cd democratising-dataproc
export GCP_PROJECT=name-of-project-you-just-created
make init
make apply
That should successfully spin up a network, subnetwork, cloudsql instance, a couple of storage buckets (one of them containing cloud-sql-proxy.sh), a service account, a firewall then fail when attempting to create the dataproc cluster.
if anyone could take a look and tell me why this is failing I'd be very grateful.
There were a number of problems here that have now been solved:
hive:hive.metastore.warehouse.dir
property needed settinghost = '%'
)The state of the repo at the time of posting this message will work as intended (i.e. create, using Terraform, a dataproc cluster that uses a shared hive metastore).
Thank you @igor-dvorzhak for your responses, your link to the article on configuring Hive Metastore to use Cloud SQL put me on the right track..