Search code examples
sqlsasrowcount

Selecting rows based on total row count in SAS EG


I have a query selecting all Paid_Dates from a table, in year/month format YYYYMM. My question is, how can I select only the three most recent Paid_Dates, skipping the most recent.

SELECT "Paid_Date"
FROM  PaidDateTable
WHERE "Paid_Date" > '202201';

Results
202201
202202
202203
202204
202205

I would only want to return 202202, 202203, 202204. I would also like this to be dynamic, meaning the total number of rows can change as the year goes on and would always select the last three prior to the latest paid date. I am wondering if there is a way to do this based on total row count? Something along the lines of:

WHERE RowNumber = TotalRowCount - 1 OR RowNumber = TotalRowCount - 2 OR RowNumber = TotalRowCount - 3

Solution

  • Is the data in an actual SAS dataset? Does the data only have one observation per month? Is it already sorted by month? Then skip the PROC SQL code and use normal SAS code to pick just the observations you want.

    data want;
      do point=max(1,nobs-3) to nobs-1;
        set have point=point nobs=nobs;
        output;
      end;
      stop;
    run;
    

    Are the dates you want from the file consistently the same number of months away from the current date? For example you posted on June 2022 that you want just the months from February 2022 to April 2022. So those are -4 to -2 months away from today.

    where paid_date between "%sysfunc(intnx(month,%sysfunc(today()),-4),yymmn6.)"
      and "%sysfunc(intnx(month,%sysfunc(today()),-2),yymmn6.)"
    

    Or do you really need to check the maximum value in the dataset?

    where input(paid_date,yymm6.)
    between intnx('month',max(input(paid_date,yymm6.)),-3)
        and intnx('month',max(input(paid_date,yymm6.)),-1)
    

    If the data is large you might improve performance by using IN operator instead of comparison operator. So you want to generate somehow

    where paid_date in ('202204','202203','202202')
    

    If it is based on today's date you something like this to put that list into a macro varaible.

     data _null_;
        length string $50 ;
        do index=-4 to -2;
          string=catx(',',string,quote(put(intnx('month',today(),index),yymmn6.),"'"));
        end;
        call symputx('range',string);
      run;
    
      ....
      where paid_date in (&range)
    

    If it is based on the last date in the file then adjust accordingly.