I have installed Hadoop in pseudo distributed mode on my laptop, OS is Ubuntu.
I have changed paths where hadoop will store its data (by default hadoop stores data in /tmp
folder)
hdfs-site.xml
file looks as below :
<property>
<name>dfs.data.dir</name>
<value>/HADOOP_CLUSTER_DATA/data</value>
</property>
Now whenever I restart machine and try to start hadoop cluster using start-all.sh
script, data node never starts. I confirmed that data node is not start by checking logs and by using jps
command.
Then I
stop-all.sh
script.hadoop namenode -format
command.start-all.sh
script.Now everything works fine even if I stop and start cluster again. Problem occurs only when I restart machine and try to start the cluster.
By changing dfs.datanode.data.dir
away from /tmp
you indeed made the data (the blocks) survive across a reboot. However there is more to HDFS than just blocks. You need to make sure all the relevant dirs point away from /tmp
, most notably dfs.namenode.name.dir
(I can't tell what other dirs you have to change, it depends on your config, but the namenode dir is mandatory, could be also sufficient).
I would also recommend using a more recent Hadoop distribution. BTW, the 1.1 namenode dir setting is dfs.name.dir
.