Search code examples
pythonpandasoracle-databasenumpydataset

Is there a way to create 10 millions row of random dataset in python?


I would like to create random dataset consists of 10 million rows. Unfortunately, I could not find a way to create date column with specific range (example from 01.01.2021-31.12.2021).

I tried with oracle sql, but could not find a way to do that. There is way that I can do in excel, but excel can not handle 10 millions row of data. Therefore, I though Python can be the best way to do that, but I could not figure it out.


Solution

  • I would like to create random dataset consists of 10 million rows. Unfortunately, I could not find a way to create date column with specific range (example from 01.01.2021-31.12.2021).

    I tried with oracle sql, but could not find a way to do that.

    You can use the DBMS_RANDOM package with a hierarchical query:

    SELECT DATE '2021-01-01'
           + DBMS_RANDOM.VALUE(0, DATE '2022-01-01' - DATE '2021-01-01')
             AS random_date
    FROM   DUAL
    CONNECT BY LEVEL <= 10000000;
    

    Which outputs:

    RANDOM_DATE
    2021-11-25 00:53:13
    2021-08-28 22:33:35
    2021-02-11 23:28:50
    2021-12-10 05:39:00
    2021-01-10 22:02:47
    ...
    2021-01-01 16:39:13
    2021-10-30 20:58:21
    2021-03-14 06:27:34
    2021-10-11 00:24:03
    2021-04-20 03:53:54

    fiddle