Search code examples
amazon-s3hiveelastic-map-reduceemr

Exporting Hive Table to a S3 bucket


I've created a Hive Table through an Elastic MapReduce interactive session and populated it from a CSV file like this:

CREATE TABLE csvimport(id BIGINT, time STRING, log STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t';

LOAD DATA LOCAL INPATH '/home/hadoop/file.csv' OVERWRITE INTO TABLE csvimport;

I now want to store the Hive table in a S3 bucket so the table is preserved once I terminate the MapReduce instance.

Does anyone know how to do this?


Solution

  • Yes you have to export and import your data at the start and end of your hive session

    To do this you need to create a table that is mapped onto S3 bucket and directory

    CREATE TABLE csvexport (
      id BIGINT, time STRING, log STRING
      ) 
     row format delimited fields terminated by ',' 
     lines terminated by '\n' 
     STORED AS TEXTFILE
     LOCATION 's3n://bucket/directory/';
    

    Insert data into s3 table and when the insert is complete the directory will have a csv file

     INSERT OVERWRITE TABLE csvexport 
     select id, time, log
     from csvimport;
    

    Your table is now preserved and when you create a new hive instance you can reimport your data

    Your table can be stored in a few different formats depending on where you want to use it.