Search code examples
daterandomhivehiveql

Selecting date(1st of every month) from a date range in Hive


Need to generate random date(1st of every month) selected from a given date range in hive (inclusive range). For example if range is 25/12/2021 - 01/06/2022, then I want to select random date from this set of dates{01/01/2022, 01/02/2022, 01/03/2022, 01/04/2022, 01/05/2022, 01/06/2022).

Can any one guide me with my query?

I tried using

select concat('2019','-',lpad(floor(RAND()*100.0)%10+1,2,0),'-',lpad(floor(RAND()*100.0)%31+1,2,0));

but this needs date, I need to pass a column value as low range and a particular date as 2nd range. Since there are different dates for different columns for the low range to b passed.


Solution

  • You can use below code to calculate a random date between two dates.

    select trunc(date_add(start_dt, cast (datediff( end_dt,start_dt)*rand() as INT)),'MM') as random_dt
    

    You can test the logic using below code-

    select trunc(date_add('2021-01-17', cast (datediff( '2022-01-27','2021-01-17')*rand() as INT)), 'MM') as random_dt
    

    Explanation -
    Idea is to add a random number that is less than date difference to the start date.
    datediff() - This returns diff of date as INT.
    rand() - This returns a number between 0,1(both included). Which means, your start or end date can be same as random date sometime.
    date_add - This adds the random integer to the start date to generate random date. trunc(dt,'MM') - is going to return first day of the month.