Search code examples
hadoophadoop-yarngoogle-cloud-dataproc

How do I restart hadoop services on dataproc cluster


I may be searching with the wrong terms, but google is not telling me how to do this. The question is how can I restart hadoop services on Dataproc after changing some configuration files (yarn properties, etc)?

Services have to be restarted on a specific order throughout the cluster. There must be scripts or tools out there, hopefully in the Dataproc installation, that I can invoke to restart the cluster.


Solution

  • Configuring properties is a common and well supported use case.

    You can do this via cluster properties, no daemon restart required. Example:

    dataproc clusters create my-cluster --properties yarn:yarn.resourcemanager.client.thread-count=100

    If you're doing something more advanced, like updating service log levels, then you can use systemctl to restart services.

    First ssh to a cluster node and type systemctl to see the list of available services. For example to restart HDFS NameNode type sudo systemctl restart hadoop-hdfs-namenode.service

    If this is part of initialization action then sudo is not needed.