Search code examples
sasdatastep

SAS DO Loop seems to skip records


In writing a very simple DATA step to start a new project, I ran across some strange behavior.

The only difference between set1 and set2 is the use of the variable lagscore in the equation in set1 vs. dummy in the equation in set2.

set1 produces output that appears to indicate that including lagscore causes the score and lagscore variables to be undefined in half of the iterations.

Note that I was careful to NOT call lag() more than once and I include the call in set2 just to make sure that the lag() function call is not the source of the problem.

I appreciate any explanations. I've been away from SAS for quite awhile and I sense that I am missing something fundamental in how the processing occurs.

(Sorry for the difficult to read output. I could not figure out how to paste it and retain spacing)

data set1;
obs=1;
score=500;
a_dist = -5.0;
b_dist = 0.1;
dummy = 0;
output;
do obs = 2 to 10;
    lagscore = lag(score);
    score = lagscore + 1 /(b_dist * lagscore + a_dist);
    output;
end;
run;
data set2;
obs=1;
score=500;
a_dist = -5.0;
b_dist = 0.1;
dummy = 0;
output;
do obs = 2 to 10;
    lagscore = lag(score);
/*      score = lagscore + 1 /(b_dist * lagscore + a_dist);*/
    score = dummy + 1 /(b_dist * dummy + a_dist);
    output;
end;
run;`

Set1 results

obs score   a_dist  b_dist  dummy   lagscore

1   500     -5  0.1 0   .

2   .       -5  0.1 0   .

3   500.02  -5  0.1 0   500

4   .       -5  0.1 0   .

5   500.04  -5  0.1 0   500.02

6   .       -5  0.1 0   .

7   500.06  -5  0.1 0   500.04

8   .       -5  0.1 0   .

9   500.08  -5  0.1 0   500.06

10  .       -5  0.1 0   .

Set2 results

obs score   a_dist  b_dist  dummy   lagscore

1   500     -5  0.1 0   .

2   -0.2    -5  0.1 0   .

3   -0.2    -5  0.1 0   500

4   -0.2    -5  0.1 0   -0.2

5   -0.2    -5  0.1 0   -0.2

6   -0.2    -5  0.1 0   -0.2

7   -0.2    -5  0.1 0   -0.2

8   -0.2    -5  0.1 0   -0.2

9   -0.2    -5  0.1 0   -0.2

10  -0.2    -5  0.1 0   -0.2

Solution

  • The key point is that when you call the lag() function it returns a value from a queue that is initialized with missing values. The default is a queue with one item in it.

    In your code:

    score=500 ;
    *...;
    do obs = 2 to 10;
        lagscore = lag(score);
        score = lagscore + 1 /(b_dist * lagscore + a_dist);
        output;
    end;
    

    The first iteration of the loop (obs=2), LAGSCORE will be assigned a missing value because the queue is initialized with a missing value. The value 500 will be stored in the queue. SCORE will be assigned a missing value because LAGSCORE is missing, and therefore the expression lagscore + 1 /(b_dist * lagscore + a_dist) will return missing.

    The second iteration of the loop (obs=3), LAGSCORE will be assigned the value 500 (read from the queue), and the value of SCORE (a missing value) is written to the queue. Score is then assigned the value 500.2 from the expression lagscore + 1 /(b_dist * lagscore + a_dist).

    The third iteration of the loop (obs=4), LAGSCORE will be assigned a missing value (read from the queue) and the value 500.2 is written to the queue.

    And that pattern repeats.

    If I understand your intent, you don't actually need the LAG function for this sort of data creation. You can just use a DO loop with an output statement in it, and update the value of SCORE after you output each record. Something like:

    data set1 ;
      score = 500 ;
      a_dist = -5.0 ;
      b_dist = 0.1 ;
      do obs = 1 to 10 ;
        output ;
        score = score + (1 /(b_dist * score + a_dist)) ;
      end ;
    run ;
    

    Returns:

     score     a_dist    b_dist    obs
    
    500.000      -5        0.1       1
    500.022      -5        0.1       2
    500.044      -5        0.1       3
    500.067      -5        0.1       4
    500.089      -5        0.1       5
    500.111      -5        0.1       6
    500.133      -5        0.1       7
    500.156      -5        0.1       8
    500.178      -5        0.1       9
    500.200      -5        0.1      10