My question is an extension of that found here: Construct new variable from given 5 categorical variables in Stata
I am an R user and I have been struggling to adjust to the Stata syntax. Also, I'm use to being able to Google for R documentation/examples online and haven't found as many resources for Stata so I've come here.
I have a data set where the rows represent individual people and the columns record various attributes of these people. There are 5 categorical variables (white, hispanic, black, asian, other) that have binary response data, 0 or 1 ("No" or "Yes"). I want to create a mosaic plot of race vs response data using the spineplots package. However, I believe I must first combine all 5 of the categorical variables into a categorical variable with 5 levels that maintains the labels (so I can see the response rate for each ethnicity.) I've been playing around with the egen function but haven't been able to get it to work. Any help would be appreciated.
Edit: Added a depiction of what my data looks like and what I want it to look like.
my data right now:
person_id,black,asian,white,hispanic,responded
1,0,0,1,0,0
2,1,0,0,0,0
3,1,0,0,0,1
4,0,1,0,0,1
5,0,1,0,0,1
6,0,1,0,0,0
7,0,0,1,0,1
8,0,0,0,1,1
what I want is to produce a table through the tabulate command to make the following:
respond, black, asian, white, hispanic
responded to survey | 20, 30, 25, 10, 15
did not respond | 15, 20, 21, 23, 33
It seems like you want a single indicator variable rather than multiple {0,1} dummies. The easiest way is probably with a loop; another option is to use cond()
to generate a new indicator variable (note that you may want to catch respondents for whom all the race dummies are 0
in an 'other' group), label its values (and the values of responded
), and then create your frequency table:
clear
input person_id black asian white hispanic responded
1 0 0 1 0 0
2 1 0 0 0 0
3 1 0 0 0 1
4 0 1 0 0 1
5 0 1 0 0 1
6 0 1 0 0 0
7 0 0 1 0 1
8 0 0 0 1 1
9 0 0 0 0 1
end
gen race = "other"
foreach v of varlist black asian white hispanic {
replace race = "`v'" if `v' == 1
}
label define race2 1 "asian" 2 "black" 3 "hispanic" 4 "white" 99 "other"
gen race2:race2 = cond(black == 1, 1, ///
cond(asian == 1, 2, ///
cond(white == 1, 3, ///
cond(hispanic == 1, 4, 99))))
label define responded 0 "did not respond" 1 "responded to survey"
label values responded responded
tab responded race
with the result
| race
responded | asian black hispanic other white | Total
--------------------+-------------------------------------------------------+----------
did not respond | 1 1 0 0 1 | 3
responded to survey | 2 1 1 1 1 | 6
--------------------+-------------------------------------------------------+----------
Total | 3 2 1 1 2 | 9
tab responded race2
yields the same results with a different ordering (by the actual values of race2
rather than the alphabetical ordering of the value labels).