Search code examples
sqlrandomhivehiveqldate-range

How to generate a random date from a given range with hive


use hive . I need to generate random dates selected from a given date range, between '2019-01-01' and '2019-10-31'. Can any one guide me with my query?


Solution

  • Using floor(rand*100)%N+1 you can generate random natural numbers in the range 1 .. N. Use lpad to add leading 0 to get two gigits strings like 01, 02 if random number is single digit.

    Demo:

     select concat('2019','-',lpad(floor(RAND()*100.0)%10+1,2,0),'-',lpad(floor(RAND()*100.0)%31+1,2,0));
    

    Result:

    2019-04-31
    

    Add date() function to solve problem with 28, 30 - 31 days in different months. date() will convert invalid date to nearest valid one: For example date('2019-02-31') returns 2019-03-03

    All together:

    select date(concat('2019','-',lpad(floor(RAND()*100.0)%10+1,2,0),'-',lpad(floor(RAND()*100.0)%31+1,2,0)));
    

    Result:

    2019-07-10
    

    One more option is to generate required date range with order number and join it with random number:

    set hivevar:start_date=2019-01-01; 
    set hivevar:end_date=2019-10-31; 
    
    with date_range as 
    (--this query generates date range with order number, max i=303 for this range 
    select date_add ('${hivevar:start_date}',s.i) as dt, i 
      from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
    )
    
    --join range generated with random number 0..303(check max in date_range)
    select d.dt from date_range d inner join (select floor(RAND()*100.0)%304 as i) r on d.i=r.i;
    

    Result:

    2019-01-12
    

    All tested in Hive.