Search code examples
hadoophdfsbigdataceph

Invalid URI for NameNode address, s3a is not of schema 'hdfs'


I'm doing something about replacing HDFS with Ceph in hadoop environment (yarn), according to my research, the guideline from hortonworks and Replace HDFS form local disk to s3 getting error shows me that I need to modify core-site.xml under $hadoop_home/etc/hadoop.
My modification is like below:

<property>
        <name>fs.s3a.access.key</name>
        <value>xxxxxxxxxxxxxx</value>
</property>
<property>
        <name>fs.s3a.secret.key</name>
        <value>xxxxxxxxxxxxx</value>
</property>
<property>
        <name>fs.default.name</name>
        <value>s3a://bucket_name</value>
</property>

<property>
        <name>fs.defaultFS</name>
        <value>s3a://bucket_name</value>
</property>
<property>
        <name>fs.s3a.endpoint</name>
        <value>http://x.x.x.x:xxxx</value>
</property>
<property>
        <name>fs.AbstractFileSystem.s3a.imp</name>
        <value>org.apache.hadoop.fs.s3a.S3A</value>
</property>

However, when I tried to start the hadoop by sbin/start-all.sh, I got error like below,

java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): s3a://bucket_name is not of scheme 'hdfs'.

For your information, my hadoop version is 3.2.0.

Thanks for your help in advance.


Solution

  • After digging into hadoop source code, I think this exception should be thrown.

    The code below can't be skipped when you're trying to invoke sbin/start-all.sh.

      /**
       * @return address of file system
       */
      public static InetSocketAddress getNNAddress(URI filesystemURI) {
        String authority = filesystemURI.getAuthority();
        if (authority == null) {
          throw new IllegalArgumentException(String.format(
              "Invalid URI for NameNode address (check %s): %s has no authority.",
              FileSystem.FS_DEFAULT_NAME_KEY, filesystemURI.toString()));
        }
        if (!HdfsConstants.HDFS_URI_SCHEME.equalsIgnoreCase(
            filesystemURI.getScheme())) {
          throw new IllegalArgumentException(String.format(
              "Invalid URI for NameNode address (check %s): " +
              "%s is not of scheme '%s'.", FileSystem.FS_DEFAULT_NAME_KEY,
              filesystemURI.toString(), HdfsConstants.HDFS_URI_SCHEME));
        }
        return getNNAddress(authority);
      }
    

    I don't need to start namenode and secondarynamenode since I use ceph as my backend storage system. The ceph itself can manage its so-called datanode by its driver.

    Keep the thread here for anyone who might have same concern and welcome any comment about my understanding.