Can I use s3 as a Hive storage outside the Amazon EMR environment?

I have case where I manage the service in a EC2 machine. This machine running Hive and I am planning to use s3 as my hive storage (instead of hdfs). Is it possible?

Solution

There is a detailed write up of how to do this here http://blog.mustardgrain.com/2010/09/30/using-hive-with-existing-files-on-s3/

Some choice bits:

Now, let’s change our configuration a bit so that we can access the S3 bucket with all our data. First, we need to include the following configuration. This can be done via HIVE_OPTS, configuration files ($HIVE_HOME/conf/hive-site.xml), or via Hive CLI’s SET command.

Here are the configuration parameters:

Name fs.s3n.awsAccessKeyId Value Your S3 access key

Name fs.s3n.awsSecretAccessKey Value Your S3 secret access key

And:

Whether you prefer the term veneer, façade, wrapper, or whatever, we need to tell Hive where to find our data and the format of the files. Let’s create a Hive table definition that references the data in S3:
CREATE EXTERNAL TABLE mydata (key STRING, value INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '='
LOCATION 's3n://mys3bucket/';