Search code examples
rsurvey

Using svyrecvar to get the variance of a statistic in the "survey" R package


I am writing an extension to the survey R package. My function estimates kurtosis with complex survey data, but I am not sure how to get the standard error of these statistics, like the output from existing functions like svymean and svytotal.

svykurt <- function(
    x,
    design,
    na.rm = FALSE,
    excess = TRUE
) {

  if (!inherits(design, "survey.design"))
    stop("design is not a survey design")

  x <- model.frame(x, design$variables, na.action = na.pass)
  x <- as.matrix(x)

  if (ncol(x) > 1)
    stop("Only calculate kurtosis one variable at a time")

  if(na.rm){
    x <- x[!is.na(x)]
  }

  pweights <- 1/design$prob
  psum <- sum(pweights)
  mean_x <- svymean(x, design, na.rm = na.rm)
  var_x <- svyvar(x, design, na.rm = na.rm)

  m4 <- sum(pweights * (x - mean_x)^4) / psum
  kurt <- m4 / var_x^2

  if (excess) {
    kurt <- kurt - 3
  }

  class(kurt) <- "svykurt"

  return(kurt)

}

print.svykurt <- function(x) {
  m <- as.matrix(x, ncol = 1)
  rownames(m) <- names(x)
  colnames(m) <- "kurtosis"

  print(m)
}

At the moment, I am just printing kurtosis without the standard error.

data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
svykurt(x = ~api00, dclus1, na.rm = TRUE)

However, I eventually want to assign the output to class svystat and print a matrix containing both values, as in svymean(x = ~api00, dclus1, na.rm = TRUE).

Looking at survey source code, I can see that svymean and svytotal use a higher-level function called svrecvar, e.g.,

attr(total, "var")<-v<-svyrecvar(x/design$prob,design$cluster,
                                     design$strata, design$fpc,
                                   postStrata=design$postStrata)

I am just not sure how to apply this to my case.


Solution

  • If you want the kurtosis, you are much better off working with svycontrast than with svyrecvar. You can write the variance and fourth central moment in terms of the raw moments (following the answer here) and then transform

    > moments<-svymean(~enroll+I(enroll^2)+I(enroll^3)+I(enroll^4), dstrat)
    > moments
                      mean         SE
    enroll      5.9528e+02 1.8509e+01
    I(enroll^2) 5.4895e+05 4.0250e+04
    I(enroll^3) 7.5137e+08 1.0066e+08
    I(enroll^4) 1.3348e+12 2.7683e+11
    > 
    > central_moments<-svycontrast(moments, list(mu4=quote(-3*enroll^4+6*enroll^2*`I(enroll^2)`-4*enroll*`I(enroll^3)`+`I(enroll^4)`),
    sigma2=quote(`I(enroll^2)`-enroll^2)))
    > central_moments
                nlcon         SE
    mu4    3.3618e+11 1.0458e+11
    sigma2 1.9459e+05 2.3116e+04
    > 
    > 
    > svycontrast(central_moments, quote(mu4/(sigma2*sigma2)))
              nlcon     SE
    contrast 8.8783 1.5678
    

    As a general guide, if you have an explicit formula for a statistic in terms of moments or cell counts or proportions, it's easier to use svycontrast and if you have an estimation equation or maximisation problem it's easier to use svyrecvar.

    To use svyrecvar you would need to work out or look up the influence functions for the kurtosis, which is likely to be feasible but a bit annoying. If you do work out the influence functions, it's actually more straightforward to use svytotal to work out the variance

    For example, in svyVGAM:::svy_vglm.survey.design there's a lot of code to fit a VGAM model and work out its influence functions, followed by

        v <- vcov(svytotal(inffuns, design))
        dimnames(v) <- list(names(coef(fit)), names(coef(fit)))
    

    to do the variance computation.