I need to pull humongous amount of data, say 600-700 variables from different tables in a data warehouse...now the dataset in its raw form will easily touch 150 gigs - 79 MM rows and for my analysis purpose I need only a million rows...how can I pull data using proc sql directly from warehouse by doing simple random sampling on the rows.
Below code wont work as ranuni is not supported by oracle
proc sql outobs =1000000;
select * from connection to oracle(
select * from tbl1 order by ranuni(12345);
quit;
How do you propose I do it
None of the answers posted or comments helped my cause, it could but we have 87 MM rows
Now I wanted the answer with the help of sas: here is what I did: and it works. Thanks all!
libname dwh path username pwd;
proc sql;
create table sample as
(select
<all the variables>, ranuni(any arbitrary seed)
from dwh.<all the tables>
<bunch of where conditions goes here>);
quit);