I'm trying to specify the base directory for HDFS files in my hdfs-site.xml
under Windows 7 (Hadoop 2.7.1 that I built from source, using Java SDK 1.8.0_45 and Windows SDK 7.1). I can't figure how to provide a path that specifies a drive.
My hdfs-site.xml
looks like this:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>XXX</value>
</property>
</configuration>
and I tried various values for XXX
, which I tested with hdfs namenode -format
, all leading to one of these 2 errors:
XXX=D:/tmp/hdp
:
15/07/10 23:38:33 ERROR namenode.NameNode: Failed to start namenode.
java.lang.IllegalArgumentException: URI has an authority component
at java.io.File.<init>(File.java:423)
at org.apache.hadoop.hdfs.server.namenode.NNStorage.getStorageDirectory(NNStorage.java:329)
XXX=D:\tmp\hdp
: ERROR common.Util: Syntax error in URI file://D:\tmp\hdp/dfs/name
Other variants that gave similar errors: file:///D:/tmp/hdp
(from http://hortonworks.com/community/forums/topic/hadoop-configuration-files-issues/), file://D:/tmp/hdp
, D:\\tmp\\hdp
And if I use /D/tmp/hdp
it does not crash, but goes into a D
folder on my current drive.
I'm out of ideas, any suggestion? (NB: besides using Cygwin, which is not an option for me)
You can specify a drive spec in hadoop.tmp.dir
in core-site.xml by prepending a '/' in front of the absolute path, and using '/' as the path separator instead of '\' for all path elements. For example, if the desired absolute path is D:\tmp\hdp, then it would look like this:
<property>
<name>hadoop.tmp.dir</name>
<value>/D:/tmp/hdp</value>
</property>
The reason this works is that the default values for many of the HDFS directories are configured to be file://${hadoop.tmp.dir}/suffix
. See the default definitions of dfs.namenode.name.dir
, dfs.datanode.data.dir
and dfs.namenode.checkpoint.dir
here:
http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
Substituting the above value for hadoop.tmp.dir
yields a valid file:
URI with a drive spec and no authority, which satisfies the requirements for the HDFS configuration. It's important to use '/' instead of '\', because a bare unencoded '\' character is not valid in URL syntax.
http://www.ietf.org/rfc/rfc1738.txt
If you prefer not to rely on this substitution behavior, then it's also valid to override all configuration properties that make use of hadoop.tmp.dir
within your hdfs-site.xml file. Each value must be a full file:
URI. For example:
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///D:/tmp/hdp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///D:/tmp/hdp/dfs/data</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>file:///D:/tmp/hdp/dfs/namesecondary</value>
</property>
You might find this more readable overall.