I am using the survey package by Thomas Lumley to create cross tabs and SE's. I am struggling to specify the denominator of the cross tabulation.
library(survey)
data <- read_table2("Q50_1 Q50_2 Q38 Q90 pov gender wgt id
yes 3 Yes NA High M 1.3 A
NA 4 No 2 Med F 0.4 B
no 2 NA 4 Low F 1.2 C
maybe 3 No 2 High M 0.5 D
yes NA No NA High M 0.7 E
no 2 Yes 3 Low F 0.56 F
maybe 4 Yes 2 Med F 0.9 G")
design <- svydesign(id =~id,
weights = ~wgt,
nest = FALSE,
data = data)
svymean(~interaction(Q50_1,gender=="F"), design, na.rm = T)
This gives me:
mean SE
interaction(Q50_1, gender == "F")maybe.FALSE 0.096899 0.1043
interaction(Q50_1, gender == "F")no.FALSE 0.000000 0.0000
interaction(Q50_1, gender == "F")yes.FALSE 0.387597 0.2331
interaction(Q50_1, gender == "F")maybe.TRUE 0.174419 0.1725
interaction(Q50_1, gender == "F")no.TRUE 0.341085 0.2233
interaction(Q50_1, gender == "F")yes.TRUE 0.000000 0.0000
This is not as useful to me because the denominator includes TRUE FALSE values for every combination, whereas I am only interested in the mean that is true. So, I could easily find the percentage of TRUE as follows:
dat <- as.data.frame(svymean(~interaction(Q50_1,gender=="F"), design, na.rm = T)) %>% tibble::rownames_to_column("question")
dat %>% tidyr::separate(question,c("question",'response'), sep = "\\)", extra = "merge") %>%
mutate(question = str_replace(question,"interaction\\("," ")) %>%
tidyr::separate(response,c('value', 'bool'), sep ="\\." ) %>%
tidyr::separate(question,c('question', 'group'), sep ="\\," ) %>%
tidyr::separate(group,c('group_level', 'group'), sep ="\\==" ) %>%
filter(bool=='TRUE') %>%
group_by(question, group_level, group) %>%
mutate(sum_true = sum(mean)) %>%
mutate(mean= mean/sum_true)
This gives me:
question group_level group value bool mean SE sum_true
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
" Q50_1" " gender " " \"F\"" maybe TRUE 0.338 0.173 0.516
" Q50_1" " gender " " \"F\"" no TRUE 0.662 0.223 0.516
" Q50_1" " gender " " \"F\"" yes TRUE 0 0 0.516
The means are exactly what I want, but the SEs are associated with a different denominator and don't represent the manipulated mean. Is there a way to call the svymean to present the mean and SE of ONLY the TRUE values in the denominator?
I thought something like this might do (but it does not work):
svymean(~interaction(Q50_1,gender=="F"[TRUE]), design, na.rm = T)
mean SE
interaction(Q50_1, gender == "F"[TRUE])maybe.TRUE 0.338 0.0725
interaction(Q50_1, gender == "F"[TRUE])no.TRUE 0.0.662 0.0233
interaction(Q50_1, gender == "F"[TRUE])yes.TRUE 0.0 0.0000
To get the percentage of women who responded to each answer you want
svymean(~Q50_1, subset(design, gender== "F"),na.rm=TRUE)
or equivalently (because that's how svyby
does it)
svyby(~Q50_1, ~gender, design, svymean, na.rm = TRUE)
If you want to get the empty category as well, you need to convert the ~Q50_1
variable to a factor -- that's the point of factors (vs strings): they know what levels they have.
If you to be able to extract parts of the output programmatically, use the coef
and SE
functions
data$Q50_1<-factor(data$Q50_1)
design <- svydesign(id =~id,
weights = ~wgt,
nest = FALSE,
data = data)
svymean(~Q50_1, subset(design, gender== "F"),na.rm=TRUE)
svyby(~Q50_1, ~gender, design, svymean, na.rm = TRUE)[1,]
coef(svyby(~Q50_1, ~gender, design, svymean, na.rm = TRUE))
SE(svyby(~Q50_1, ~gender, design, svymean, na.rm = TRUE))
These don't agree with what you got using ~interaction
, because what you got that way doesn't match what you said you want. The interaction
analysis gives you the percentage of people who are women and also responded yes
, not the percentage of yes
responses among women. To put it another way, the 6 percentages you get with the interaction
analysis add to 100%, not to 200%.
> sum(coef(svymean(~interaction(Q50_1,gender=="F"), design, na.rm = T)))
[1] 1