I have a hive table(consumer_data) with partition column 'val_dt' which is a string column having values in the date format 'yyyy-MM'.
I have multiple partitions in the table, from '2015-01' to '2020-04'. Each month the data is added incrementally to the table so the next partition added would be '2020-05'.
I want to drop the partitions that are older than 24 months. This partition delete is done every month to retain only the last 24 months data.
How can the ALTER command be framed to drop the partitions with values older than 24 months.
I ran the below beeline query with sample data through linux shell but it deleted all the partitions.
beeline --hivevar var_drop_date="$(date -d "24 months ago" '+%Y-%m')" -e 'ALTER TABLE consumer_data DROP IF EXISTS PARTITION(val_dt <= date "${hivevar:var_drop_date}")'
Partition present before running the query:
val_dt=2016-01
val_dt=2017-01
val_dt=2019-01
val_dt=2020-01
Partitions present after running the query:
None
Required output:
val_dt=2019-01
val_dt=2020-01
You do not need hivevar
when using -e
option, variables can be substituted directly. Also you do not need additional date
function in the partition specification if the partition is in yyyy-MM
format, you can calculate date in required format using shell only. Also 'yyyy-MM'
should be in quotes because partition is string, like this: val_dt<='2018-05'
Try this:
beeline -e "ALTER TABLE consumer_data DROP IF EXISTS PARTITION(val_dt <= '$(date -d "24 months ago" '+%Y-%m')')"