Search code examples
hadoop

Hadoop namenode needs to be formatted after every computer start


I have searched for this problem and while there are a number of similar examples I can't find a common solution or one that works for me. I have installed Hadoop and am running in pseudo distributed mode. It works fine, and I can start and stop it a number of times and get it running fine. However, if I re-start the computer and start Hadoop the namenode doesn't start. I need to format it every time, which means I lose all the work I have done and need to start again.

I am following Hadoop: The Definitive Guide v3.

My core-site.xml says:

<configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://localhost/</value>
    </property>
</configuration>

My hdfs-site.xml says:

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

Is there a way of configuring Hadoop so that I don't need to re-format the namenode every time I restart the computer?

Thanks.


Solution

  • Looks like you are not overriding the hdfs configurations dfs.name.dir , dfs.data.dir, by default it points to /tmp directory which will be cleared when your machine restarts. You have to change this from /tmp to another location in your home directory by overriding these values in your hdfs-site.xml file located in your HADOOP configuration directory.

    Do the following steps

    Create a directory in your home directory for keeping namenode image & datanode blocks (Replace with your login name)

    mkdir /home/<USER>/pseudo/
    

    Modify your hdfs-site.xml file in your HADOOP_CONF_DIR(hadoop configuration direcotry) as follows

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    <property>
      <name>dfs.name.dir</name>
      <value>file:///home/<USER>/pseudo/dfs/name</value>
    </property>
    <property>
      <name>dfs.data.dir</name>
      <value>file:///home/<USER>/pseudo/dfs/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    
    </configuration>
    

    Format your hdfs namenode & start using