Search code examples
rmulti-levelr-forestplotmetafor

How to plot multi-level meta-analysis by study (in contrast to the subgroup)?


I am doing a multi-level meta-analysis. Many studies have several subgroups. When I make a forest plot studies are presented as subgroups. There are 60 of them, however, I would like to plot studies according to the study, then it would be 25 studies and it would be more appropriate. Does anyone have an idea how to do this forest plot?

I did it this way:

full.model <- rma.mv(yi = yi, 
                     V = vi, 
                     slab = Author,
                     data = df,
                     random = ~ 1 | Author/Study, 
                     test = "t", 
                     method = "REML")
forest(full.model)

Solution

  • It is not clear to me if you want to aggregate to the Author level or to the Study level. If there are multiple rows of data for particular studies, then the model isn't really complete and you would want to add another random intercept for the level of the estimates within studies. Essentially, the lowest random effect should have as many values for nlvls in the output as there are estimates (k).

    Let's first tackle the case where we have a multilevel structure with two levels, studies and multiple estimates within studies (for some technical reasons, some might call this a three-level model, but let's not get into this). I will use a fully reproducible example for illustration purposes, using the dat.konstantopoulos2011 dataset, where we have districts and schools within districts. We fit a multilevel model of the type as you have with:

    library(metafor)   
    dat <- dat.konstantopoulos2011
    res <- rma.mv(yi, vi, random = ~ 1 | district/school, data=dat)
    res
    

    We can aggregate the estimates to the district level using the aggregate() function, specifying the marginal var-cov matrix of the estimates from the model to account for their non-independence (note that this makes use of aggregate.escalc() which only works with escalc objects, so if it is not, you need to convert the dataset to one - see help(aggregate.escalc) for details):

    agg <- aggregate(dat, cluster=dat$district, V=vcov(res, type="obs"))
    agg
    

    You will find that if you then fit an equal-effects model to these estimates based on the aggregated data that the results are identical to what you obtained from the multilevel model (we use an equal-effects model since the heterogeneity accounted for by the multilevel model is already encapsulated in vcov(res, type="obs")):

    rma(yi, vi, method="EE", data=agg)
    

    So, we can now use these aggregated values in a forest plot:

    with(agg, forest(yi, vi, slab=district))
    

    My guess based on your description is that you actually have an additional level that you should include in the model and that you want to aggregate to the intermediate level. This is a tad more complicated, since aggregate() isn't meant for that. Just for illustration purposes, say we use year as another (higher) level and I will mess a bit with the data so that all three variance components are non-zero (again, just for illustration purposes):

    dat$yi[dat$year == 1976] <- dat$yi[dat$year == 1976] + 0.8
    res <- rma.mv(yi, vi, random = ~ 1 | year/district/school, data=dat)
    res
    

    Now instead of aggregate(), we can accomplish the same thing by using a multivariate model, including the intermediate level as a factor and using again vcov(res, type="obs") as the var-cov matrix:

    agg <- rma.mv(yi, V=vcov(res, type="obs"), mods = ~ 0 + factor(district), data=dat)
    agg
    

    Now the model coefficients of this model are the aggregated values and the var-cov matrix of the model coefficients is the var-cov matrix of these aggregated values:

    coef(agg)
    vcov(agg)
    

    They are not all independent (since we haven't aggregated to the highest level), so if we want to check that we can obtain the same results as from the multilevel model, we must account for this dependency:

    rma.mv(coef(agg), V=vcov(agg), method="EE")
    

    Again, exactly the same results. So now we use these coefficients and the diagonal from vcov(agg) as their sampling variances in the forest plot:

    forest(coef(agg), diag(vcov(agg)), slab=names(coef(agg)))
    

    The forest plot cannot indicate the dependency that still remains in these values, so if one were to meta-analyze these aggregated values using only diag(vcov(agg)) as their sampling variances, the results would not be identical to what you get from the full multilevel model. But there isn't really a way around that and the plot is just a visualization of the aggregated estimates and the CIs shown are correct.