Search code examples
rmetaformeta-analysis

long-format data in metafor (one study = several rows)


Does the metafor package accept only long-format data, in which one study has one row? Specifically, all the analysis examples seem to supply the escalc function with "one row per study data" data. I.e. experimental & placebo results are different columns in the same row. So, several-rows-per-study data has to be transposed to this single-row format, correct?


Solution

  • The metafor package can deal just fine with multiple rows for a study. If you look into the examples on multilevel/multivariate/network meta-analysis, you will find plenty of examples. It seems like you are dealing with the situation that there are two treatment arms and one control/placebo condition. You can then set up your dataset with two rows for such a study, one for comparing treatment A against the control/placebo condition and for comparing treatment B against the control/placebo condition. The information from the control/placebo condition is then simply repeated in the two rows. For example, say you have two studies, one as described above and one with just treatment A. Then the dataset might look like this:

    dat <- data.frame(study = c(1, 1, 2), trt = c("A", "B", "A"),
                      ai = c(17, 14, 23), n1i = c( 96, 102, 215),
                      ci = c(27, 27, 35), n2i = c(101, 101, 218))
    dat
    
    #   study trt ai n1i ci n2i
    # 1     1   A 17  96 27 101
    # 2     1   B 14 102 27 101
    # 3     2   A 23 215 35 218
    

    Here, I am assuming some kind of dichotomous outcome is measured in all studies, so we get counts of the number of people in each condition that experienced some outcome of interest, but the same setup would arise if one had means/SDs per condition. Note how the data for the control condition (ci and n2i) is repeated within study 1. Now you can compute, for example, log odds ratios per comparison with:

    library(metafor)
    dat <- escalc(measure="OR", ai=ai, n1i=n1i, ci=ci, n2i=n2i, data=dat)
    dat
    
    #   study trt ai n1i ci n2i      yi     vi
    # 1     1   A 17  96 27 101 -0.5280 0.1220
    # 2     1   B 14 102 27 101 -0.8301 0.1333
    # 3     2   A 23 215 35 218 -0.4679 0.0827
    

    There is one important issue to consider here. Since the data from the control group is reused in the computation of the two log odds ratios for study 1, the two estimates are not independent. Equations for computing the covariance between the two estimates can be found, for example in Gleser and Olkin (2009). See here for a further details and code: https://www.metafor-project.org/doku.php/analyses:gleser2009

    However, as long as the group sizes within a study are not strongly imbalanced, then the correlation between the two estimates is approximately 0.5 and one can construct the corresponding variance-covariance matrix (for the three log odds ratios) with:

    vcalc(vi, cluster=study, grp1=trt, data=dat)
    
    #            [,1]       [,2]      [,3]
    # [1,] 0.12203231 0.06378112 0.0000000
    # [2,] 0.06378112 0.13334276 0.0000000
    # [3,] 0.00000000 0.00000000 0.0827225