I am loading my logs from S3 into Hive with
CREATE TABLE logs(
`col1` struct<`country`:string,`page`:string,`date`:string>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://application-logs/sample/' ;
My data looks like this
{
"col1": {
"country": "India",
"page": "/signup",
"date": "2018-01-01"
}
}
If I want to create a partition on col1.country, col1.page, col1.date how should I include that in create a statement, I tried like colName.fieldName, but with no success.
You can try directly without mentioning the column name, something like below
CREATE TABLE logs(
`col1` struct<`country`:string,`page`:string,`date`:string>
)
partitioned by (country string, page string, date string)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3a://application-logs/sample/' ;
Please note that external tables will not detect the partitions directly, you have to alter and add the partitions, something like below :
ALTER TABLE logs ADD PARTITION (country=india, pager=whatever, date=whatever) location '/hdfs/path/';
#You might also need to repair the table at the end
msck repair table schemaName.tableName