use hive . I need to generate random dates selected from a given date range, between '2019-01-01' and '2019-10-31'. Can any one guide me with my query?
Using floor(rand*100)%N+1
you can generate random natural numbers in the range 1 .. N. Use lpad to add leading 0 to get two gigits strings like 01, 02 if random number is single digit.
Demo:
select concat('2019','-',lpad(floor(RAND()*100.0)%10+1,2,0),'-',lpad(floor(RAND()*100.0)%31+1,2,0));
Result:
2019-04-31
Add date() function to solve problem with 28, 30 - 31 days in different months. date() will convert invalid date to nearest valid one: For example date('2019-02-31') returns 2019-03-03
All together:
select date(concat('2019','-',lpad(floor(RAND()*100.0)%10+1,2,0),'-',lpad(floor(RAND()*100.0)%31+1,2,0)));
Result:
2019-07-10
One more option is to generate required date range with order number and join it with random number:
set hivevar:start_date=2019-01-01;
set hivevar:end_date=2019-10-31;
with date_range as
(--this query generates date range with order number, max i=303 for this range
select date_add ('${hivevar:start_date}',s.i) as dt, i
from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
)
--join range generated with random number 0..303(check max in date_range)
select d.dt from date_range d inner join (select floor(RAND()*100.0)%304 as i) r on d.i=r.i;
Result:
2019-01-12
All tested in Hive.