Search code examples
sasdo-loopsdatastep

SAS Do Loop is Omitting Rows in Processing


I have the following code. I am trying to test a paragraph (descr) for a list of keywords (key_words). When I execute this code, the log reads in all the variables for the array, but will only test 2 of the 20,000 rows in the do loop (do i=1 to 100 and on). Any suggestions on how to fix this issue?

data JE.KeywordMatchTemp1;
  set JE.JEMasterTemp end=eof;
  if _n_ = 1 then do i = 1 by 1 until (eof);
    set JE.KeyWords;
    array keywords[100] $30 _temporary_;
    keywords[i] = Key_Words;
  end;
  match = 0;
  do i = 1 to 100;
    if index(descr, keywords[i]) then match = 1;
  end;
  drop i;
run;

Solution

  • Your problem is that your end=eof is in the wrong place.

    This is a trivial example calculating the 'rank' of the age value for each SASHELP.CLASS respondent.

    See where I put the end=eof. That's because you need to use it to control the array filling operation. Otherwise, what happens is your loop that is do i = 1 to eof; doesn't really do what you're saying it should: it's not actually terminating at eof since that is never true (as it is defined in the first set statement). Instead it terminates because you reach beyond the end of the dataset, which is specifically what you don't want.

    That's what the end=eof is doing: it's preventing you from trying to pull a row when the array filling dataset is finished, which terminates the whole data step. Any time you see a data step terminate after exactly 2 iterations, you can be confident that's what the problem is likely to be - it is a very common issue.

    data class_ranks;
      set sashelp.class;   *This dataset you are okay iterating over until the end of the dataset and then quitting the data step, like a normal data step.;
      array ages[19] _temporary_; 
      if _n_=1 then do;
        do _i = 1 by 1 until (eof);   *iterate until the end of the *second* set statement;
          set sashelp.class end=eof;  *see here? This eof is telling this loop when to stop.  It is okay that it is not created until after the loop is.;
          ages[_i] = age;
        end;
        call sortn(of ages[*]);   *ordering the ages loaded by number so they are in proper order for doing the trivial rank task;
      end;
      age_rank = whichn(age,of ages[*]);  *determine where in the list the age falls.  For a real version of this task you would have to check whether this ever happens, and if not you would have to have logic to find the nearest point or whatnot.;
    run;