I am writing an extension to the survey
R package. My function estimates kurtosis with complex survey data, but I am not sure how to get the standard error of these statistics, like the output from existing functions like svymean
and svytotal
.
svykurt <- function(
x,
design,
na.rm = FALSE,
excess = TRUE
) {
if (!inherits(design, "survey.design"))
stop("design is not a survey design")
x <- model.frame(x, design$variables, na.action = na.pass)
x <- as.matrix(x)
if (ncol(x) > 1)
stop("Only calculate kurtosis one variable at a time")
if(na.rm){
x <- x[!is.na(x)]
}
pweights <- 1/design$prob
psum <- sum(pweights)
mean_x <- svymean(x, design, na.rm = na.rm)
var_x <- svyvar(x, design, na.rm = na.rm)
m4 <- sum(pweights * (x - mean_x)^4) / psum
kurt <- m4 / var_x^2
if (excess) {
kurt <- kurt - 3
}
class(kurt) <- "svykurt"
return(kurt)
}
print.svykurt <- function(x) {
m <- as.matrix(x, ncol = 1)
rownames(m) <- names(x)
colnames(m) <- "kurtosis"
print(m)
}
At the moment, I am just printing kurtosis without the standard error.
data(api)
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
svykurt(x = ~api00, dclus1, na.rm = TRUE)
However, I eventually want to assign the output to class svystat
and print a matrix containing both values, as in svymean(x = ~api00, dclus1, na.rm = TRUE)
.
Looking at survey
source code, I can see that svymean
and svytotal
use a higher-level function called svrecvar
, e.g.,
attr(total, "var")<-v<-svyrecvar(x/design$prob,design$cluster,
design$strata, design$fpc,
postStrata=design$postStrata)
I am just not sure how to apply this to my case.
If you want the kurtosis, you are much better off working with svycontrast
than with svyrecvar
. You can write the variance and fourth central moment in terms of the raw moments (following the answer here) and then transform
> moments<-svymean(~enroll+I(enroll^2)+I(enroll^3)+I(enroll^4), dstrat)
> moments
mean SE
enroll 5.9528e+02 1.8509e+01
I(enroll^2) 5.4895e+05 4.0250e+04
I(enroll^3) 7.5137e+08 1.0066e+08
I(enroll^4) 1.3348e+12 2.7683e+11
>
> central_moments<-svycontrast(moments, list(mu4=quote(-3*enroll^4+6*enroll^2*`I(enroll^2)`-4*enroll*`I(enroll^3)`+`I(enroll^4)`),
sigma2=quote(`I(enroll^2)`-enroll^2)))
> central_moments
nlcon SE
mu4 3.3618e+11 1.0458e+11
sigma2 1.9459e+05 2.3116e+04
>
>
> svycontrast(central_moments, quote(mu4/(sigma2*sigma2)))
nlcon SE
contrast 8.8783 1.5678
As a general guide, if you have an explicit formula for a statistic in terms of moments or cell counts or proportions, it's easier to use svycontrast
and if you have an estimation equation or maximisation problem it's easier to use svyrecvar
.
To use svyrecvar
you would need to work out or look up the influence functions for the kurtosis, which is likely to be feasible but a bit annoying. If you do work out the influence functions, it's actually more straightforward to use svytotal
to work out the variance
For example, in svyVGAM:::svy_vglm.survey.design
there's a lot of code to fit a VGAM model and work out its influence functions, followed by
v <- vcov(svytotal(inffuns, design))
dimnames(v) <- list(names(coef(fit)), names(coef(fit)))
to do the variance computation.