Ok, so I have a dataset that I have to sample based on another dataset's proportions, and I already have an allocation dataset with 2 columns: strata and alloc. When I run the ff code:
proc surveyselect data=have out=want outall method = srs sampsize=10000 seed=1994;
strata strata/alloc = alloc;
id name;
run;
I get this error:
ERROR: The sum of the _ALLOC_ proportions in the data set ALLOC must equal 1.
I checked my allocation dataset and I see that the strata equal to 1. I'm not sure if there's an issue with my dataset or code. I've already sorted the have
dataset by strata, and I also sorted the allocation dataset by strata as well. I've been using the same (or similar) script to randomly sample from many different datasets below, so I'm not sure why it isn't working for this one.
Any ideas? Thanks!
Edit: For more info, I'm using SAS Enterprise Guide 7.1.
For reference, the alloc table is as follows (I can't give real strata names, but I've checked and they are identical to the strata in my have
dataset):
_alloc_ | strata
0.3636363636 | strata1
0.0909090909 | strata2
0.0909090909 | strata3
0.0909090909 | strata4
0.1818181818 | strata5
0.0909090909 | strata6
0.0909090909 | strata7
I am also perplexed. As I mentioned, this code worked in other datasets except for this one. If there is any correlation at all, I created the alloc dataset using R and imported it to SAS.
Do a distinct count on all the strata values in your dataset:
proc sql noprint;
create table check as
select distinct strata
from have
;
quit;
If there are any extra groups that do not exist in the alloc
dataset or vis-versa, your error message will appear. In the example code below, alloc
has 7 strata but have
has 6 strata.
data alloc;
infile datalines dlm='|';
input _alloc_ strata$;
datalines;
0.3636363636 | strata1
0.0909090909 | strata2
0.0909090909 | strata3
0.0909090909 | strata4
0.1818181818 | strata5
0.0909090909 | strata6
0.0909090909 | strata7
;
run;
/* Only have 6 strata instead of 7 in the data */
data have;
do strata = 'strata1', 'strata2', 'strata3', 'strata4', 'strata5', 'strata6';
do i = 1 to 100;
name = 'name';
output;
end;
end;
run;
proc surveyselect data=have
out=want
outall
method = srs
sampsize=10
seed=1994
;
strata strata / alloc = alloc;
id name;
run;