Search code examples
rsurvey

Survey package in R warning message


I'm trying to use the survey package to find means for categorical variables from a random survey. I'm running into an issue using svyby() that outputs: "Warning message: In matrix(1:(ns * reps), ncol = reps, nrow = ns, byrow = TRUE) : data length [12] is not a sub-multiple or multiple of the number of rows [5]"

My question is two-fold: what is causing this issue that spits out repeated results in an uninterpretable table format? and also a more theoretical question, is there an issue with subsetting before creating the design element if we're just obtaining means (as I understand it from other posts it will effect the st. errors only).

Here's the code I'm using, attempting to use full survey results, and subset results:

#read-in zip file
library(survey)
library(rio)
td <- tempdir()
tf <- tempfile(tmpdir=td, fileext=".zip")
download.file("https://www.federalreserve.gov/consumerscommunities/files/SHED_public_use_data_2020_(CSV).zip", tf)
file_names <- unzip(tf, list=TRUE)
unzip(tf, exdir=td, overwrite=TRUE)
data <- import(file.path(td, file_names$Name[1]))

#remove weight NAs
data <- data[!is.na(data$weight_pop),]

#create subset
data.subset <- data[data$BK1 == "Yes" & data$afs == "Yes",]

#create svy designs
design <- svydesign(ids = ~CaseID
                    , weights = ~weight_pop
                    , na.rm = TRUE
                    , data = data) #full survey
design2 <- svydesign(ids = ~CaseID
                    , weights = ~weight_pop
                    , na.rm = TRUE
                    , data = data.subset) #subset of survey

svyby(~BK2_a,~race_5cat,design,svymean)
svyby(~BK2_a,~race_5cat,design2,svymean)

svyby(~BK2_b,~race_5cat,design,svymean)
svyby(~BK2_b,~race_5cat,design2,svymean)

svyby(~BK2_c,~race_5cat,design,svymean)
svyby(~BK2_c,~race_5cat,design2,svymean)

#verify observations
table(data$BK2_a,data$race_5cat)
table(data.subset$BK2_a,data.subset$race_5cat)

table(data$BK2_b,data$race_5cat)
table(data.subset$BK2_b,data.subset$race_5cat)

table(data$BK2_c,data$race_5cat)
table(data.subset$BK2_c,data.subset$race_5cat)

Some of the results come out as I'd hope such as (~BK2_b,~race_5cat,design2,svymean) or (~BK2_c,~race_5cat,design,svymean) but the others throw the warning message and have uninterpretable tables with repeated figures.

Using the table function to look at the observations seems to indicate the issue might lie in the "Refused" response, but when replacing this response with NA via "data <- data %>% mutate(BK2_a = str_replace(BK2_a,"Refused", replacement = NA_character_))" and then na.rm.all=TRUE in the svyby function it introduces NAs and NaNs. I've tried changing the columns to factors as well which does nothing. I'm a novice with the survey package so any help is greatly appreciated.


Solution

  • First, rather than create design2 on the subsetted data, I would use subset(design, BK1 == "Yes" & afs == "Yes") when you want to limit to that subgroup. But this only matters if you care about getting the correct standard errors, as you mention.

    Second, to resolve your issue, try coding the response variables BK2_a, BK2_b, and BK2_c as factors. Alternatively, you can simply use svytable() directly if you want the proportions for each group. See below:

    data <- data[!is.na(data$weight_pop),]
    data$BK2_a_f <- factor(data$BK2_a)
    data$BK2_b_f <- factor(data$BK2_b)
    data$BK2_c_f <- factor(data$BK2_c)
    
    #create subset
    data.subset <- data[data$BK1 == "Yes" & data$afs == "Yes",]
    
    #create svy designs
    design <- svydesign(ids = ~CaseID, weights = ~weight_pop, 
                        na.rm = TRUE, data = data) #full survey
    design2 <- svydesign(ids = ~CaseID, weights = ~weight_pop, 
                         na.rm = TRUE, data = data.subset) #subset of survey
    design2_better <- subset(design, BK1 == "Yes" & afs == "Yes")
    
    
    svyby(~BK2_a_f, ~race_5cat, design, svymean)
    svyby(~BK2_a_f, ~race_5cat, design2, svymean)
    svyby(~BK2_a_f, ~race_5cat, design2_better, svymean)
    
    svyby(~BK2_b_f, ~race_5cat, design, svymean)
    svyby(~BK2_b_f, ~race_5cat, design2, svymean)
    svyby(~BK2_b_f, ~race_5cat, design2_better, svymean)
    
    svyby(~BK2_c_f, ~race_5cat, design, svymean)
    svyby(~BK2_c_f, ~race_5cat, design2, svymean)
    svyby(~BK2_c_f, ~race_5cat, design2_better, svymean)
    
    # Alternative approach to getting proportions if you don't care about SEs
    prop.table(svytable(~race_5cat + BK2_a, design), 1)
    prop.table(svytable(~race_5cat + BK2_b, design), 1)
    prop.table(svytable(~race_5cat + BK2_c, design), 1)