Search code examples
amazon-web-servicesamazon-s3hbaseamazon-emr

Is it possible to store HBase data on AWS S3 for online application? How?


I am pretty new in AWS. I am planning to use HBase as a database for my system and going to install it on EC2 and its actual data files on S3 because of lower storage cost and having good integration with EMR. I don't want to use Amazon EMR for mere HBase which would available for 24/7 and avoid extra cost. But going to use it for some analytics later. Any idea how to configure HBase for such setup?


Solution

  • HBase can use any filesystem supported in Hadoop, including S3 but if you do not use EMR it will be too slow.

    For using S3 as the datasource, you need to configure your Hadoop filesystem to be s3.

    The hbase-site.xml may look like this:

    <configuration>
    
    <property>
        <name>hbase.rootdir</name>
        <value>s3://ebucketrkr.s3.amazonaws.com:80/</value>
      </property>
    
    
    
    </configuration>
    

    The hadoop-site.xml may look like this:

    <configuration>
    <property>
      <name>fs.default.name</name>
      <value>s3://hbase</value>
    </property>
    
    <property>
      <name>fs.s3.awsAccessKeyId</name>
      <value>id</value>
    </property>
    
    <property>
      <name>fs.s3.awsSecretAccessKey</name>
      <value>pass</value>
    </property>
    
    </configuration>