Search code examples
apache-sparkhive

Hive table partition by year month and day query


Hi I am using a hive table partitioned by 3 columns (year, month and day). I have the data in below format.

/table1/year=2024/month=1/day=20/file1
/table1/year=2024/month=1/day=16/file1
/table1/year=2024/month=1/day=9/file1

Need help to write a select query to get all the records from the latest year, month and day partition dynamically.

Below is the query with hard coded values and I need dynamic query for the same. Thanks

Select * from table1 where year=2024 and month=1 and day=20;


Solution

  • You can calculate those columns using year(), month(), and day() functions on sysdate()

    Select * from table1 where year=year(current_timestamp()) and month = month(current_timestamp())  and day=day(current_timestamp());
    

    if you dont know what is max partition in the table, you need to use below SQL -

    select * from table1  
    join (
    select max(from_unixtime(unix_timestamp( concat(day, case when length(month)=1,'0','' end,month,year)  , 'ddMMyyyy'))) as dt from table1) maxdt on 
    from_unixtime(unix_timestamp( concat(day, case when length(month)=1,'0','' end,month,year)  , 'ddMMyyyy')) = maxdt.dt
    

    the subquery maxdt in inner join returns max date of the partitions.