Search code examples
windowshadoophdfs

Setting hadoop.tmp.dir on Windows gives error: URI has an authority component


I'm trying to specify the base directory for HDFS files in my hdfs-site.xml under Windows 7 (Hadoop 2.7.1 that I built from source, using Java SDK 1.8.0_45 and Windows SDK 7.1). I can't figure how to provide a path that specifies a drive.

My hdfs-site.xml looks like this:

<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>XXX</value> </property> </configuration>

and I tried various values for XXX, which I tested with hdfs namenode -format, all leading to one of these 2 errors:

  • XXX=D:/tmp/hdp: 15/07/10 23:38:33 ERROR namenode.NameNode: Failed to start namenode. java.lang.IllegalArgumentException: URI has an authority component at java.io.File.<init>(File.java:423) at org.apache.hadoop.hdfs.server.namenode.NNStorage.getStorageDirectory(NNStorage.java:329)
  • XXX=D:\tmp\hdp: ERROR common.Util: Syntax error in URI file://D:\tmp\hdp/dfs/name

Other variants that gave similar errors: file:///D:/tmp/hdp (from http://hortonworks.com/community/forums/topic/hadoop-configuration-files-issues/), file://D:/tmp/hdp, D:\\tmp\\hdp

And if I use /D/tmp/hdp it does not crash, but goes into a D folder on my current drive.

I'm out of ideas, any suggestion? (NB: besides using Cygwin, which is not an option for me)


Solution

  • You can specify a drive spec in hadoop.tmp.dir in core-site.xml by prepending a '/' in front of the absolute path, and using '/' as the path separator instead of '\' for all path elements. For example, if the desired absolute path is D:\tmp\hdp, then it would look like this:

    <property>
        <name>hadoop.tmp.dir</name>
        <value>/D:/tmp/hdp</value>
    </property>
    

    The reason this works is that the default values for many of the HDFS directories are configured to be file://${hadoop.tmp.dir}/suffix. See the default definitions of dfs.namenode.name.dir, dfs.datanode.data.dir and dfs.namenode.checkpoint.dir here:

    http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

    Substituting the above value for hadoop.tmp.dir yields a valid file: URI with a drive spec and no authority, which satisfies the requirements for the HDFS configuration. It's important to use '/' instead of '\', because a bare unencoded '\' character is not valid in URL syntax.

    http://www.ietf.org/rfc/rfc1738.txt

    If you prefer not to rely on this substitution behavior, then it's also valid to override all configuration properties that make use of hadoop.tmp.dir within your hdfs-site.xml file. Each value must be a full file: URI. For example:

    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///D:/tmp/hdp/dfs/name</value>
    </property>
    
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///D:/tmp/hdp/dfs/data</value>
    </property>
    
    <property>
        <name>dfs.namenode.checkpoint.dir</name>
        <value>file:///D:/tmp/hdp/dfs/namesecondary</value>
    </property>
    

    You might find this more readable overall.