Search code examples
rsassas-macrodo-loopssas-iml

How to call R repeatedly in a loop and retrieve the results for further processing in SAS


I have simplified the code to illustrate the problem:

proc iml;
var=40;
call ExportMatrixToR(var, "var" );
submit / R;
sample<-sample(1:var, 50, replace=TRUE)
endsubmit;
call ImportDataSetFromR( "WORK.rdata", "sample" );
proc means data=rdata; 
output out=a;
run;

How to have a better control of var, for example, if I would like to try different value of var=(20,40,80,100,120...), how to accomplish that like people can do easily in a Macro?

Please note that the rdata is transported from R to SAS to be analyzed, so we might need to create different data.frames in R with names depending on the value of var. Any easier way?

*******Update********

Dr. Wicklin, I have your book on my desk, it is amazing. Thank you so much for taking time to answer the question.

I tried your code and it worked perfectly, but I forgot to mention that my simulated data has a character variable. The submitted R code looks like this:

 submit Ni / R;
 sample<-sample(1:&Ni, 50, replace=TRUE)   
 group<-rep(LETTERS[1:2],25)
 df<-data.frame(sample, group)
 endsubmit;

I tried to work around your code to accommodate this feature but SAS log keep saying "Variable group has type inconsistent with the data set". Could you help?


*******Update2**************


    proc iml;
    N = do(20, 120, 20);
    ID = 1; sample = .; group="";     
    create rdata var {ID "sample" "group"}; /* open data set for writing */
    do i = 1 to ncol(N);
    Ni = N[i];    /* get the i_th parameter; pass in on the SUBMIT statement */
       submit Ni / R;
     sample<-sample(1:&Ni, 50, replace=TRUE)   
     group<-rep(LETTERS[1:2],25)
       endsubmit;
    call ImportMatrixFromR(sample, "sample"); 
    call ImportMatrixFromR(group, "group"); 
       ID = j(nrow(sample), 1, i);   /* also save ID variable */
       append;              /* write IML data to SAS data set */
    end;
    close rdata;
    quit;

    proc means data=rdata; 
    by ID;        /* analyze all the results in a single call */
    output out=a;
    run;`

Solution

  • I assume you want to try these values sequentially, like in a loop? If so, your question is perhaps better phrased as, "how to call R repeatedly in a loop and retrieve the results for further processing in SAS."

    First, read the article "Twelve advantages to calling R from the SAS/IML language." The first item describes how to call R in a loop and provides an example. The third item shows how to pass parameters from SAS into R.

    Next, read the article "Simulation in SAS: The slow way or the BY way", which describes how to construct a SAS data set so that you can perform repeated computations in an efficient manner. Combining those two ideas leads to the following program structure:

    1. Create a loop in IML and call R repeatedly. Alternatively, you can send in a vector of parameters and do the looping in R. The second method can be more efficient, but the first matches your example better, so let's go with that option.
    2. After each analysis, retrieve the result(s). You can write the result to a SAS data set and include an ID variable that will be used as a BY variable in the next step.
    3. You now have a SAS data set that contains k results, each identified by an indicator variable. Call a SAS procedure (PROC MEANS in your example) to analyze each result.

    Here's an example:

    proc iml;
    N = do(20, 120, 20);
    ID = 1; sample = .;     /* we will write a numeric variable */
    create rdata var {ID "sample"}; /* open data set for writing */
    do i = 1 to ncol(N);
       Ni = N[i];    /* get the i_th parameter; pass in on the SUBMIT statement */
       submit Ni / R;
          sample<-sample(1:&Ni, 50, replace=TRUE)   # access parameter in R
       endsubmit;
       call ImportMatrixFromR(sample, "sample"); /* create IML var; copy from R */
       ID = j(nrow(sample), 1, i);   /* also save ID variable */
       append;              /* write IML data to SAS data set */
    end;
    close rdata;
    quit;
    
    proc means data=rdata; 
    by ID;        /* analyze all the results in a single call */
    output out=a;
    run;
    

    In the program, I've hard-coded the vector {20, 40, 60,...}. You could equally well get those values from a macro variable or from an input data set. For example

    data NValues;
    input Vals @@;
    datalines;
    20 40 60 80 100 120
    ;
    
    proc iml;
    use NValues; read all var "Vals"; close;
    N = T( Vals );
    /* ...etc ... */