Search code examples
rregressionsurveyweighted

Inconsistent weights with Lumley's survey package (svyglm vs svyolr). Error or by design?


My main goal is conducting mediation (using the mediation R package) while accounting for sampling weights (using the survey package). This led me to discover that weights are stored differently for models fit with survey::svyglm and for models fit with survey::svyolr.

set.seed(3459823)

library(survey)
library(srvyr)
library(tidyverse)
library(mediation)

n <- 100

# simulate data
dat <- tibble::tibble(
  id = 1:100,
  y_binary = sample(c(0,1), size = n, replace = TRUE),
  m_binary = sample(c(0,1), size = n, replace = TRUE),
  m_ordinal = sample(c(1, 2, 3), size = n, replace = TRUE, prob = c(0.2, 0.3, 0.5)) |> 
    as.factor() |> 
    as.ordered(),
  pred_normal = rnorm(n = n),
  weight = rnorm(n = n, mean = 15),
  weight_rep1 = rnorm(n = n, mean = weight),
  weight_rep2 = rnorm(n = n, mean = weight),
  weight_rep3 = rnorm(n = n, mean = weight),
  weight_rep4 = rnorm(n = n, mean = weight),
  weight_rep5 = rnorm(n = n, mean = weight)
)

# create design object
dat_design <- dat |> 
  srvyr::as_survey_rep(
    repweights = dplyr::contains("_rep"),
    weights = weight,
    combined_weights = TRUE
  )

# ordinal mediator models
model_y_binary <- survey::svyglm(
  formula = y_binary ~ m_ordinal + pred_normal, 
  design = dat_design, 
  family = "binomial"
)

model_m_ordinal <- survey::svyolr(
  formula = m_ordinal ~ pred_normal, 
  design = dat_design
)

mediation_ordinal <- mediation::mediate(
  model.m = model_m_ordinal,
  model.y = model_y_binary,
  sims = 50,
  treat = "pred_normal",
  mediator = "m_ordinal"
)

# Won't run!
# Error in mediation::mediate(model.m = model_m_ordinal, model.y = model_y_binary,  : 
#   weights on outcome and mediator models not identical

I then inspected the weights saved in the model objects using the model.frame() function, which is also what mediation::mediate seems to use under the hood to extract the weights.

# Inspecting the weights saved by the survey models:

model.frame(model_y_binary) |> dplyr::slice_head()

# Result:
#   y_binary m_ordinal pred_normal (weights)
# 1        1         3 -0.09479119  1.004946

# so, the binomial model stores the weights in a column in the model.frame

model.frame(model_m_ordinal) |> dplyr::slice_head()

# Result:
#   m_ordinal pred_normal (weights)
# 1         3 -0.09479119  14.92443

# so, the ordinal model stores the weight as well, but they are different!

weight_comparison <- tibble::tibble(
  binomial_weights = model.frame(model_y_binary)$`(weights`,
  ordinal_weights = model.frame(model_m_ordinal)$`(weights)`,
  weight_quotient = binomial_weights / ordinal_weights
)

weight_comparison |> dplyr::slice_head(n = 5)

# Result:
#   binomial_weights ordinal_weights weight_quotient
#              <dbl>           <dbl>           <dbl>
# 1            1.00             14.9          0.0673
# 2            1.03             15.2          0.0673
# 3            0.992            14.7          0.0673
# 4            1.01             14.9          0.0673
# 5            0.927            13.8          0.0673

# so, all the binomial weights seem to be equal to the ordinal weights times ~0.0673

Why does this happen?

Is there a way to alter the fit survey::svyolr model object and force the weights to be equal to those stored in the svyglm model object?

Could the mediation package be altered to take this into account? For example by just taking the weights from model.m and not comparing them to model.y? (Of course this could also lead to other problems.)


Solution

  • Some functions in the survey package rescale the weights to have unit mean because the large weights (thousands or tens of thousands) in some national surveys can cause convergence problems.

    Since the results aren't affected at all, the mediate package could probably work around this fairly easily. However, there's a rescale=FALSE option to svyglm to not rescale the weights, provided for this sort of purpose.

    If you then have convergence problems in svyglm you could manually rescale the weights to have unit mean before doing any of the analyses.