Search code examples
randomfilterspss

Select random 50% of sample, but only 1 person per couple


Essentially, I'm trying to do stratified random sampling. I want to run an analysis on data with heterosexual couples. I need to select a random 50% of women and a random 50% of men they are not married to. I know how to filter out a random percentage of the total sample, but not how to ensure that only one person per household is selected.

My data look like this:

couple person gender Q1 Q2 Q3 Q4 Q5

1 1 0 3.5 4.2 2.3 3.3 4.3

1 2 1 3.2 2.5 2.1 3.7 5.6

2 1 1 3.7 2.6 3.3 4.2 5.1

2 2 0 3.0 3.5 2.1 3.6 5.4

It's in long format, so each row represents a person and there are two people per couple.

EDITED for more details

hhid = couple hhidpn = person ragender = gender in which 1 = male, 2 = female SPAQ1-8 = items 1-8 of a self-perceptions of aging scale

[1]: https://i.sstatic.net/yKQ7k.png

Solution

  • My suggestion is to randomly divide the couples in two equal groups, and then select the women in one group and the men in the other group.

    First I'll reconstruct your example data to demonstrate on:

    data list list/couple person gender (3f1) Q1   Q2   Q3   Q4    Q5 (5f2.1).
    begin data
    1  1  0   3.5  4.2  2.3  3.3  4.3
    1  2  1   3.2  2.5  2.1  3.7  5.6
    2  1  1   3.7  2.6  3.3  4.2  5.1
    2  2  0   3.0  3.5  2.1  3.6  5.4
    3  1  0   3.5  4.2  2.3  3.3  4.3
    3  2  1   3.2  2.5  2.1  3.7  5.6
    4  1  1   3.7  2.6  3.3  4.2  5.1
    4  2  0   3.0  3.5  2.1  3.6  5.4
    end data.
    

    Now we have a dataset, we can do the sampling as you need:

    EDITED for a shorter process - using a version of @rossum's suggestion:

    * first we give each couple a random number, and then use it to sort the couples randomly.
    sort cases by couple.
    compute randorder=uniform(100).
    if couple=lag(couple) randorder=lag(randorder).
    sort cases by randorder.
    
    * now we create a running index for the couples, and use it to select males 
    or females according to odd or even index.
    compute coupleNum=1.
    if $casenum>1 coupleNum=lag(coupleNum)+(couple<>lag(couple)).
    compute selected=(mod(coupleNum, 2)=gender).
    exe.
    

    Now you created your selection variable, you can use it with filter or with select to continue to the analysis.

    EDIT: The above code works for gender having values 0,1. The edit to the OP shows the values for gender are actually 1,2. So the final computation of selected should be done this way instead of as above:

    compute selected=(mod(coupleNum, 2)+1=gender).