Search code examples
rdplyrsurvey

r survey package: Geometric means with confidence intervals


I am roughly reproducing this code:

library(haven)
library(survey)
library(dplyr)

nhanesDemo <- read_xpt(url("https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DEMO_I.XPT"))

# Rename variables into something more readable
nhanesDemo$fpl <- nhanesDemo$INDFMPIR
nhanesDemo$age <- nhanesDemo$RIDAGEYR
nhanesDemo$gender <- nhanesDemo$RIAGENDR
nhanesDemo$persWeight <- nhanesDemo$WTINT2YR
nhanesDemo$psu <- nhanesDemo$SDMVPSU
nhanesDemo$strata <- nhanesDemo$SDMVSTRA

# Select the necessary columns
nhanesAnalysis <- nhanesDemo %>%
  select(fpl, age, gender, persWeight, psu, strata)

# Set up the design
nhanesDesign <- svydesign(id      = ~psu,
                          strata  = ~strata,
                          weights = ~persWeight,
                          nest    = TRUE,
                          data    = nhanesAnalysis)

# Select those between the agest of 18 and 79
ageDesign <- subset(nhanesDesign, age > 17 & age < 80)

They calculate a simple arithmetic mean as follows:

# Arithmetic mean
svymean(~age, ageDesign, na.rm = TRUE)

I would like to calculate (1) the geometric mean using svymean or some related method (2) with a 95% confidence interval of the geometric mean without having to manually use the SE, if possible. I tried

svymean(~log(age), log(ageDesign), na.rm = TRUE)

but that throws an error. How do I do this?


Solution

  • You want to take the logarithm of the variable, not of the whole survey design (which doesn't make any sense)

    Here's how to compute the mean of log(age), get a confidence interval, and then exponentiate it to the geometric mean scale

    > svymean(~log(age), ageDesign, na.rm = TRUE)
               mean     SE
    log(age) 3.7489 0.0115
    > meanlog<-svymean(~log(age), ageDesign, na.rm = TRUE)
    > confint(meanlog)
                2.5 %   97.5 %
    log(age) 3.726351 3.771372
    > exp(confint(meanlog))
                2.5 %   97.5 %
    log(age) 41.52729 43.43963
    

    Or, you can exponentiate first and then construct the confidence interval:

    > (geomean<-svycontrast(meanlog, quote(exp(`log(age)`))))
              nlcon     SE
    contrast 42.473 0.4878
    > confint(geomean)
                2.5 %   97.5 %
    contrast 41.51661 43.42879
    

    I would expect the first approach to give better confidence intervals in general, but it makes almost no difference here.