Search code examples
amazon-web-servicesamazon-s3hiveemr

Can I use s3 as a Hive storage outside the Amazon EMR environment?


I have case where I manage the service in a EC2 machine. This machine running Hive and I am planning to use s3 as my hive storage (instead of hdfs). Is it possible?


Solution

  • There is a detailed write up of how to do this here http://blog.mustardgrain.com/2010/09/30/using-hive-with-existing-files-on-s3/

    Some choice bits:

    Now, let’s change our configuration a bit so that we can access the S3 bucket with all our data. First, we need to include the following configuration. This can be done via HIVE_OPTS, configuration files ($HIVE_HOME/conf/hive-site.xml), or via Hive CLI’s SET command.

    Here are the configuration parameters:

    Name fs.s3n.awsAccessKeyId Value Your S3 access key

    Name fs.s3n.awsSecretAccessKey Value Your S3 secret access key

    And:

    Whether you prefer the term veneer, façade, wrapper, or whatever, we need to tell Hive where to find our data and the format of the files. Let’s create a Hive table definition that references the data in S3:

    CREATE EXTERNAL TABLE mydata (key STRING, value INT)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '='
    LOCATION 's3n://mys3bucket/';