Search code examples
oracle-databasesasstatistical-sampling

simple random sampling while pulling data from warehouse(oracle engine) using proc sql in sas


I need to pull humongous amount of data, say 600-700 variables from different tables in a data warehouse...now the dataset in its raw form will easily touch 150 gigs - 79 MM rows and for my analysis purpose I need only a million rows...how can I pull data using proc sql directly from warehouse by doing simple random sampling on the rows.

Below code wont work as ranuni is not supported by oracle

    proc sql outobs =1000000;
    select * from connection to oracle(
    select * from tbl1 order by ranuni(12345);
    quit;

How do you propose I do it


Solution

  • None of the answers posted or comments helped my cause, it could but we have 87 MM rows

    Now I wanted the answer with the help of sas: here is what I did: and it works. Thanks all!

        libname dwh path username pwd;
        proc sql;
        create table sample as
        (select 
         <all the variables>, ranuni(any arbitrary seed)
         from dwh.<all the tables>
         <bunch of where conditions goes here>);
         quit);