Laconically: Should I start HDFS every that I come back to the cluster after a power-off operation?
I have successfully created a Hadoop cluster (after loosing some battles) and now I want to be very careful on proceeding with this.
Should I execute start-dfs.sh
every time I power on the cluster, or it's ready to execute my application's code? Same for start-yarn.sh
.
I am afraid that if I run it without everything being fine, it might leave garbage directories after execution.
Just from playing around with the Hortonworks and Cloudera sandboxes, I can say turning them on and off doesn't seem to demonstrate any "side-effects".
However, it is necessary to start the needed services everytime the cluster starts.
As far as power cycling goes in a real cluster, it is recommended to stop the services running on the respective nodes before powering them down (stop-dfs.sh
and stop-yarn.sh
). That way there are no weird problems and any errors on the way to stopping the services will be properly logged on each node.