I am trying to use EMR/Hive to import data from S3 into DynamoDB. My CSV file has fields which are enclosed within double quotes and separated by comma. While creating external table in hive, I am able to specify delimiter as comma but how do I specify that fields are enclosed within quotes?
If I don’t specify, I see that values in DynamoDB are populated within two double quotes ““value”” which seems to be wrong.
I am using following command to create external table. Is there a way to specify that fields are enclosed within double quotes?
CREATE EXTERNAL TABLE emrS3_import_1(col1 string, col2 string, col3 string, col4 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '","' LOCATION 's3://emrTest/folder';
Any suggestions would be appreciated. Thanks Jitendra
If you're stuck with the CSV file format, you'll have to use a custom SerDe; and here's some work based on the opencsv libarary.
But, if you can modify the source files, you can either select a new delimiter so that the quoted fields aren't necessary (good luck), or rewrite to escape any embedded commas with a single escape character, e.g. '\', which can be specified within the ROW FORMAT with ESCAPED BY:
CREATE EXTERNAL TABLE emrS3_import_1(col1 string, col2 string, col3 string, col4 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\' LOCATION 's3://emrTest/folder';