I have case where I manage the service in a EC2 machine. This machine running Hive and I am planning to use s3 as my hive storage (instead of hdfs). Is it possible?
There is a detailed write up of how to do this here http://blog.mustardgrain.com/2010/09/30/using-hive-with-existing-files-on-s3/
Some choice bits:
Now, let’s change our configuration a bit so that we can access the S3 bucket with all our data. First, we need to include the following configuration. This can be done via HIVE_OPTS, configuration files ($HIVE_HOME/conf/hive-site.xml), or via Hive CLI’s SET command.
Here are the configuration parameters:
Name fs.s3n.awsAccessKeyId Value Your S3 access key
Name fs.s3n.awsSecretAccessKey Value Your S3 secret access key
And:
Whether you prefer the term veneer, façade, wrapper, or whatever, we need to tell Hive where to find our data and the format of the files. Let’s create a Hive table definition that references the data in S3:
CREATE EXTERNAL TABLE mydata (key STRING, value INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '=' LOCATION 's3n://mys3bucket/';