Search code examples
bashhadoophdfscluster-computingdistributed-computing

Manually start HDFS every time I boot?


Laconically: Should I start HDFS every that I come back to the cluster after a power-off operation?


I have successfully created a Hadoop cluster (after loosing some battles) and now I want to be very careful on proceeding with this.

Should I execute start-dfs.sh every time I power on the cluster, or it's ready to execute my application's code? Same for start-yarn.sh.

I am afraid that if I run it without everything being fine, it might leave garbage directories after execution.


Solution

  • Just from playing around with the Hortonworks and Cloudera sandboxes, I can say turning them on and off doesn't seem to demonstrate any "side-effects".

    However, it is necessary to start the needed services everytime the cluster starts.

    As far as power cycling goes in a real cluster, it is recommended to stop the services running on the respective nodes before powering them down (stop-dfs.sh and stop-yarn.sh). That way there are no weird problems and any errors on the way to stopping the services will be properly logged on each node.