I need to resample from a real (cluster) trial data set. So far, I have used the following PROC SURVEYSELECT
procedure in SAS to sample 10 clusters from the trial with replacement, with 50% of clusters coming from the control arm and 50% coming from the treatment arm. I repeat this 100 times to get 100 replicates with 10 clusters each and equal allocation.
proc surveyselect data=mydata out=resamples reps=100 sampsize=10 method=urs outhits;
cluster site;
strata rx / alloc=(0.5 0.5);
run;
Since I am using unrestricted random sampling (method=urs
) to sample with replacement, I specified outhits so that SAS will inform me when a cluster is sampled more than once in each replication.
However, within each replicate in the output resamples
dataset, I have not found a way to easily assign a unique identifier to clusters that appear more than once. If a cluster is sampled m times within a replicate, the observations within that cluster are simply repeated m times.
I attempted to use PROC SQL
to identify distinct cluster ids and their occurrences within each replication, thinking I could use that to duplicate IDs as appropriate before joining additional data as necessary.
proc sql;
create table clusterselect as
select distinct r.replicate, r.site, r.numberhits from resamples as r;
quit;
However, I cannot figure out how to simply replicate rows in SAS.
Any help is appreciate, whether it be modifying PROC SURVEYSELECT to yield a unique cluster id within each replication or repeating cluster IDs as appropriate based on numberhits.
Thank you!
Here's what I've done:
/* 100 resamples with replacement */
proc surveyselect data=mydata out=resamples reps=100 sampsize=10 method=urs outhits;
cluster site;
strata rx / alloc=(0.5 0.5);
run;
/* identify unique sites per replicate and their num of appearances (numberhits) */
proc sql;
create table clusterSelect as
select distinct r.replicate, r.site, r.numberhits from resamples as r;
quit;
/* for site, repeat according to numberhits */
/* create unique clusterId */
data uniqueIds;
set clusterSelect;
do i = 1 to numberhits;
clusterId = cat(site, i);
output;
end;
drop i numberhits;
run;
/* append data to cluster, retaining unique id */
proc sql;
create table resDat as
select
uid.replicate,
uid.clusterId,
uid.site,
mydata.*
from uniqueIds as uid
left join mydata
on uid.site = mydata.site
quit;
Are you just asking how to convert one observation into the number of observations indicated in the NUMBERHITS variable?
data want;
set resamples;
do _n_=1 to numberhits;
output;
end;
run;