Search code examples
sassubsetsampling

Subset data by group by proportion in SAS


In this data, I need to subset by each variable by certain percentage. For example,

Obs Group Score
1     A    1
2     A    2
3     B    1
4     B    1
5     C    3
6     C    1
7     C    1
8     A    1
9     A    3
10    A    1
11    A    2
12    B    3
13    C    2

I would need to subset 10 obs. The sample must consist of all groups, and score of 1 takes higher priority. Each group is given certain percent. Let say 50% for A, 20% for B and 30% for C.

I tried using proc surveyselect but it failed. The number of alloc is not same as the strata.

proc surveyselect data=example out=test sampsize=10;
strata group score/alloc=(0.5 0.2 0.3);
run;

Solution

  • I don't know proc surveyselect too much, so I give the data step version.

    data have;
        input Obs Group$ Score;
        cards;
    1     A    1
    2     A    2
    3     B    1
    4     B    1
    5     C    3
    6     C    1
    7     C    1
    8     A    1
    9     A    3
    10    A    1
    11    A    2
    12    B    3
    13    C    2
    ;
    run;
    
    proc sort;
        by Group Score;
    run;
    
    data want;
    
        array _Dist_[3]$ _temporary_('A','B','C');
        array _Upper_[3] _temporary_(5,2,3);
        array _Count_[3] _temporary_;
    
        do i = 1 to rec;
            set have nobs=rec point=i;
            do j = 1 to dim(_Dist_);
                _Count_[j] + (Group=_Dist_[j]);
                if _Count_[j] <= _Upper_[j] and Group = _Dist_[j] then output;
            end;
        end;
        stop;
        drop j;
    run;