I need to retain say last 7 partitions and data of a given hive external table.
This can be either done via a shell script or a hive hql script. The table is partitioned by intgestion_date=YYYY-MM-DD
what would be the best way to find the cutoff date (of 7th partition) which I can then use in the drop partitions where clause to drop everything older than that. since it's an external table, I will have to change the table properties to make it internal before the drop and then revert it.
There are different possible approaches: drop all partitions older than 7 days, this is easy (shell):
hive -e "ALTER TABLE mytable DROP IF EXISTS PARTITION(intgestion_date < '$(date -d "7 days ago" '+%Y-%m-%d')')"
But it seems this is not exactly what you want. Need to get 7th partition first and use it in the previous statement. Execute show partition, use sort, head and tail to get 7th partition:
seventh_partition=$(hive -e -S "show partitions table_name" | sort -r | head -n 7 | tail -n 1)
#extract value
part_value=${seventh_partition#*=}
#Execute drop older than 7th partition. Replace hive -e with echo and check what it prints
hive -e "ALTER TABLE table_name DROP IF EXISTS PARTITION(intgestion_date < '$part_value')"