Search code examples
sassampling

proc surveyselect alloc option reads my allocation dataset wrongly?


Ok, so I have a dataset that I have to sample based on another dataset's proportions, and I already have an allocation dataset with 2 columns: strata and alloc. When I run the ff code:

proc surveyselect data=have out=want outall method = srs sampsize=10000 seed=1994;
strata strata/alloc = alloc;
id name;
run;

I get this error:

ERROR: The sum of the _ALLOC_ proportions in the data set ALLOC must equal 1.

I checked my allocation dataset and I see that the strata equal to 1. I'm not sure if there's an issue with my dataset or code. I've already sorted the have dataset by strata, and I also sorted the allocation dataset by strata as well. I've been using the same (or similar) script to randomly sample from many different datasets below, so I'm not sure why it isn't working for this one.

Any ideas? Thanks!

Edit: For more info, I'm using SAS Enterprise Guide 7.1.

For reference, the alloc table is as follows (I can't give real strata names, but I've checked and they are identical to the strata in my have dataset):

_alloc_      | strata
0.3636363636 | strata1
0.0909090909 | strata2
0.0909090909 | strata3
0.0909090909 | strata4
0.1818181818 | strata5
0.0909090909 | strata6
0.0909090909 | strata7

I am also perplexed. As I mentioned, this code worked in other datasets except for this one. If there is any correlation at all, I created the alloc dataset using R and imported it to SAS.


Solution

  • Do a distinct count on all the strata values in your dataset:

    proc sql noprint;
        create table check as
            select distinct strata
            from have
        ;
    quit;
    

    If there are any extra groups that do not exist in the alloc dataset or vis-versa, your error message will appear. In the example code below, alloc has 7 strata but have has 6 strata.

    data alloc;
        infile datalines dlm='|';
        input _alloc_ strata$;
        datalines;
        0.3636363636 | strata1
        0.0909090909 | strata2
        0.0909090909 | strata3
        0.0909090909 | strata4
        0.1818181818 | strata5
        0.0909090909 | strata6
        0.0909090909 | strata7
        ;
    run;
    
    /* Only have 6 strata instead of 7 in the data */
    data have;
        do strata = 'strata1', 'strata2', 'strata3', 'strata4', 'strata5', 'strata6';
            do i = 1 to 100;
                name = 'name';
                output;
            end;
        end;
    run;
    
    proc surveyselect data=have 
                      out=want 
                      outall 
                      method = srs 
                      sampsize=10
                      seed=1994
                      ;
    
        strata strata / alloc = alloc;
        id name;
    run;