Search code examples
databasesassas-iml

How to replace a whole observation working on a SAS file through IML


Long time user, first time poster. I am very new to IML and have played around in R before. I am currently trying to create an adjacency list for easier computation of networks in SAS/IML. The file i am working on is huge. I am doing an implementation that involve using a SASfile and not have the adjacency list in memory. Creating an empty file and reading from specific rows (which corresponds to particular agents) all goes well until the "final" step: updating the entire observation.

Below is IML code that works, up until the last stage.

    proc iml;
    /* initialize vars*/
    checkObs = 2;
    numCol = 5;
    db = "myTestDataBase";
    nObs = 5;
    temp = {};
    myList = J(1, numCol, 0);
    nVarToUpdate = 2;

    /* create empty database */
    create (db) from myList;
        append from myList;
    close (db);
    do i = 1 to (nObs-1);
        edit (db);
            append from myList;
        close (db);
    end;

    /* read index checkObs and write to temp*/
    edit (db);
        read point (checkObs) into temp; /* Read an entire row*/
        temp[nVarToUpdate] = 1; /* I would like to update some values*/
        /* I want to replace point chekObs with the whole of vector temp*/
        replace point checkObs var _all_;
    close (db);
    print temp;

My aim is to replace/update a whole observation (row), while keeping the order of the rows intact. Any ideas?


Solution

  • @Joe's solution will work for this problem, but combining macro and IML in that way is like kissing your sister: it's not pleasant and people will look at you strangely. Joe had the right idea to get the name of the variables, but he forgot that you can use the VALSET call to perform indirect assignment. In other words, by having the name of a variable, you can change it's value.

    If you want to avoid the macro, you can get the names of the variables one time (outside the EDIT loop) and then loop over the names of the variales, like this:

    /* get column names ONE TIME */
    use (db);
       read next var _ALL_ into temp[colname=varNames]; /* get names of cols */
    close (db);
    
    /* read index checkObs and write to temp*/
    edit (db);
    read point (checkObs) into temp; /* Read an entire row*/
    temp[nVarToUpdate] = 1; /* I would like to update some values*/
    do i = 1 to ncol(temp);
       call valset(varNames[i], temp[i]); /* update scalar variables */
    end;
    replace point (checkObs) var _all_;
    close (db);
    

    The main advantage of this technique is that you can discover the variable names at run time.

    Be aware that this method (creating variable names) can be dangerous because if the data set has a variable named X, you are overwriting any pre-existing variable of that name in your program.

    Also be aware that using EDIT and READ POINT on a huge data set to change one row at a time will be slower than molasses flowing uphill in the winter time. If possible, you should read in a big block of data, operate on all rows in that block, and write out the block.

    If disk space allows, you might want to try using the SETIN and SETOUT statements to read from one data set while writing to another. That would completely eliminate the need for the REPLACE statement and the VALSET call. In general, it is more efficient to open a data set read-only and process it sequentially than to open it read/write and process it by using random access.