Search code examples
rtidyversepropensity-score-matchingmatchit

Propensity density score with MatchIt package -- how to bind rows when we have lot of datasets to have a final dataset with matched characteristics


I'm expanding this post -- answered by @edwards (Thanks).

I'm working with panel data. We assessed children in 2019, 2020, 2021 and 2022. Therefore, I have four datasets (2019, 2020, 2021, and 2022). I want to create a fifth dataset matching data from the 2nd, 3rd, 4th datasets that matches the characteristics of the first dataset (2019).This final dataset will have fewer participants, but they'll share the same characteristics as their "peers" from 2019. The proportion of boys and girls will be about the same of 2019, the mother's age will be about the same, etc.

I read the documentation for MatchIt. I know I can work with bind_row but I'm not being able to do that.

enter image description here

My final comparison table should be like this

enter image description here

My code

data <- bind_rows(df_2019, df_2020, .id="year") |>
   mutate(year=+(year==1)) # 1=2019 (treated), 0=2020 (controls)

match_obj <- matchit(year ~ asqse_quest+year_completed_cat+sex_male+momage+momed+income,
                     data = data, 
                     exact= ~ momed+income,
                     method = "optimal")
summary(match_obj)
matched_data <- match.data(match_obj)
# Summarize the merged dataset
matched_data %>%
  tableby(year_completed_cat ~ asqse_risk +  momage + momed + income_poor,, data =.) %>% summary(text=T, digits=2) %>% as.data.frame()

Any suggestions are welcomed.

df_2019 = structure(list(asqse_quest = c(24, 24, 24, 24, 24, 24, 24, 24, 
                                         24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 
                                         24, 24, 24, 24, 24, 24), asqse_risk = c(0, 0, 0, 0, 0, 0, 1, 
                                                                                 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 
                                                                                 0, 1), year_completed_cat = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 
                                                                                                                         2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                                                                                                         2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), levels = c("18", "19", "20", 
                                                                                                                                                                     "21", "22", "23", "24"), class = "factor"), sex_male = c(1, 1, 
                                                                                                                                                                                                                              0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 
                                                                                                                                                                                                                              1, 1, 0, 1, 1, 0, 1), momage = c(35, 35, 22, 38, 33, 30, 32, 
                                                                                                                                                                                                                                                               26, 21, 34, 29, 19, 35, 29, 30, 18, 30, 34, 37, 31, 23, 30, 31, 
                                                                                                                                                                                                                                                               23, 27, 30, 36, 28, 28, 29), momed = c("4", "4", "2", "2", "3", 
                                                                                                                                                                                                                                                                                                      "4", "2", "2", "4", "2", "4", "2", "4", "4", "3", "2", "4", "2", 
                                                                                                                                                                                                                                                                                                      "4", "4", "3", "3", "2", "2", "2", "4", "4", "3", "2", "4"), 
                         income = c("4", "4", "2", "3", "4", "4", "2", "3", "2", "2", 
                                    "4", "2", "4", "5", "3", "1", "4", "4", "3", "4", "3", "4", 
                                    "2", "5", "1", "4", "4", "4", "4", "5")), row.names = c(NA, 
                                                                                            -30L), class = "data.frame", na.action = structure(c(`4` = 4L, 
                                                                                                                                                 `14` = 14L, `18` = 18L, `22` = 22L, `40` = 40L, `46` = 46L, `51` = 51L, 
                                                                                                                                                 `55` = 55L, `61` = 61L, `64` = 64L, `70` = 70L, `86` = 86L, `109` = 109L, 
                                                                                                                                                 `113` = 113L, `115` = 115L, `143` = 143L, `145` = 145L, `148` = 148L, 
                                                                                                                                                 `156` = 156L, `159` = 159L, `160` = 160L, `161` = 161L, `168` = 168L, 
                                                                                                                                                 `175` = 175L, `205` = 205L, `206` = 206L, `209` = 209L, `221` = 221L, 
                                                                                                                                                 `222` = 222L, `227` = 227L, `235` = 235L, `241` = 241L, `246` = 246L, 
                                                                                                                                                 `255` = 255L, `256` = 256L, `258` = 258L, `267` = 267L, `272` = 272L, 
                                                                                                                                                 `273` = 273L, `276` = 276L, `283` = 283L, `294` = 294L, `298` = 298L, 
                                                                                                                                                 `304` = 304L, `311` = 311L, `312` = 312L, `316` = 316L, `341` = 341L, 
                                                                                                                                                 `345` = 345L, `350` = 350L, `352` = 352L, `357` = 357L, `371` = 371L, 
                                                                                                                                                 `375` = 375L, `388` = 388L, `391` = 391L, `393` = 393L, `398` = 398L, 
                                                                                                                                                 `400` = 400L, `401` = 401L, `410` = 410L, `422` = 422L, `426` = 426L, 
                                                                                                                                                 `441` = 441L, `446` = 446L, `450` = 450L, `456` = 456L, `459` = 459L, 
                                                                                                                                                 `492` = 492L, `496` = 496L, `504` = 504L, `506` = 506L, `514` = 514L, 
                                                                                                                                                 `520` = 520L, `522` = 522L, `526` = 526L, `532` = 532L, `534` = 534L, 
                                                                                                                                                 `537` = 537L, `540` = 540L, `544` = 544L, `546` = 546L, `548` = 548L, 
                                                                                                                                                 `557` = 557L, `562` = 562L, `565` = 565L, `575` = 575L, `577` = 577L, 
                                                                                                                                                 `585` = 585L, `595` = 595L, `609` = 609L, `612` = 612L, `622` = 622L, 
                                                                                                                                                 `631` = 631L, `640` = 640L, `646` = 646L, `659` = 659L, `665` = 665L, 
                                                                                                                                                 `669` = 669L, `676` = 676L, `677` = 677L, `685` = 685L, `709` = 709L, 
                                                                                                                                                 `710` = 710L), class = "omit"))

df_2020 = structure(list(asqse_quest = c(24, 24, 24, 24, 24, 24, 24, 24, 
                                         24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 
                                         24, 24, 24, 24, 24, 24), asqse_risk = c(0, 0, 0, 0, 0, 1, 0, 
                                                                                 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 
                                                                                 0, 0), year_completed_cat = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 
                                                                                                                         3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
                                                                                                                         3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), levels = c("18", "19", "20", 
                                                                                                                                                                     "21", "22", "23", "24"), class = "factor"), sex_male = c(0, 0, 
                                                                                                                                                                                                                              0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 
                                                                                                                                                                                                                              0, 1, 1, 1, 1, 1, 1), momage = c(27, 40, 25, 35, 27, 26, 31, 
                                                                                                                                                                                                                                                               31, 40, 33, 26, 21, 37, 36, 29, 30, 33, 20, 26, 33, 25, 32, 28, 
                                                                                                                                                                                                                                                               34, 36, 36, 36, 39, 35, 25), momed = c("3", "4", "2", "2", "2", 
                                                                                                                                                                                                                                                                                                      "3", "4", "4", "4", "1", "2", "3", "4", "4", "2", "2", "4", "2", 
                                                                                                                                                                                                                                                                                                      "1", "4", "1", "3", "3", "4", "3", "4", "4", "3", "4", "2"), 
                         income = c("4", "4", "4", "4", "4", "5", "4", "4", "4", "1", 
                                    "5", "3", "4", "4", "5", "4", "4", "2", "1", "4", "1", "3", 
                                    "3", "4", "5", "4", "4", "3", "5", "4")), row.names = c(NA, 
                                                                                            -30L), class = "data.frame", na.action = structure(c(`5` = 5L, 
                                                                                                                                                 `8` = 8L, `13` = 13L, `21` = 21L, `26` = 26L, `32` = 32L, `33` = 33L, 
                                                                                                                                                 `34` = 34L, `42` = 42L, `46` = 46L, `51` = 51L, `57` = 57L, `58` = 58L, 
                                                                                                                                                 `60` = 60L, `64` = 64L, `67` = 67L, `70` = 70L, `71` = 71L, `73` = 73L, 
                                                                                                                                                 `78` = 78L, `85` = 85L, `86` = 86L, `90` = 90L, `99` = 99L, `102` = 102L, 
                                                                                                                                                 `103` = 103L, `115` = 115L, `116` = 116L, `117` = 117L, `128` = 128L, 
                                                                                                                                                 `129` = 129L, `131` = 131L, `139` = 139L, `147` = 147L, `151` = 151L, 
                                                                                                                                                 `154` = 154L, `174` = 174L, `181` = 181L, `185` = 185L, `192` = 192L, 
                                                                                                                                                 `198` = 198L, `202` = 202L, `204` = 204L, `205` = 205L, `212` = 212L, 
                                                                                                                                                 `216` = 216L, `228` = 228L, `231` = 231L, `242` = 242L, `244` = 244L, 
                                                                                                                                                 `250` = 250L, `252` = 252L, `253` = 253L, `257` = 257L, `282` = 282L, 
                                                                                                                                                 `283` = 283L, `289` = 289L, `290` = 290L, `297` = 297L, `301` = 301L, 
                                                                                                                                                 `305` = 305L, `312` = 312L, `321` = 321L, `324` = 324L, `325` = 325L, 
                                                                                                                                                 `344` = 344L, `350` = 350L, `351` = 351L, `353` = 353L, `358` = 358L, 
                                                                                                                                                 `363` = 363L, `364` = 364L, `374` = 374L, `375` = 375L, `379` = 379L, 
                                                                                                                                                 `380` = 380L, `393` = 393L, `396` = 396L, `398` = 398L, `414` = 414L, 
                                                                                                                                                 `420` = 420L, `422` = 422L, `427` = 427L, `432` = 432L, `433` = 433L, 
                                                                                                                                                 `438` = 438L, `439` = 439L, `443` = 443L, `456` = 456L, `459` = 459L, 
                                                                                                                                                 `461` = 461L, `465` = 465L, `468` = 468L, `471` = 471L, `482` = 482L, 
                                                                                                                                                 `496` = 496L, `498` = 498L, `499` = 499L, `505` = 505L, `524` = 524L, 
                                                                                                                                                 `525` = 525L, `528` = 528L, `530` = 530L, `544` = 544L, `550` = 550L, 
                                                                                                                                                 `556` = 556L, `562` = 562L, `565` = 565L, `569` = 569L, `570` = 570L, 
                                                                                                                                                 `571` = 571L, `578` = 578L, `579` = 579L, `601` = 601L, `613` = 613L, 
                                                                                                                                                 `614` = 614L, `618` = 618L, `637` = 637L, `653` = 653L, `655` = 655L, 
                                                                                                                                                 `663` = 663L, `669` = 669L, `671` = 671L, `693` = 693L, `696` = 696L, 
                                                                                                                                                 `710` = 710L, `718` = 718L, `720` = 720L, `722` = 722L, `726` = 726L, 
                                                                                                                                                 `731` = 731L, `733` = 733L, `736` = 736L, `744` = 744L, `761` = 761L, 
                                                                                                                                                 `771` = 771L, `773` = 773L, `775` = 775L, `804` = 804L, `806` = 806L, 
                                                                                                                                                 `809` = 809L, `811` = 811L, `825` = 825L, `832` = 832L, `848` = 848L, 
                                                                                                                                                 `853` = 853L, `855` = 855L, `856` = 856L, `858` = 858L, `861` = 861L
                                                                                            ), class = "omit"))



df_2021 = structure(list(asqse_quest = c(24, 24, 24, 24, 24, 24, 24, 24, 
                                         24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 24, 
                                         24, 24, 24, 24, 24, 24), asqse_risk = c(0, 0, 0, 1, 0, 0, 0, 
                                                                                 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 
                                                                                 0, 1), year_completed_cat = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 
                                                                                                                         4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
                                                                                                                         4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), levels = c("18", "19", "20", 
                                                                                                                                                                     "21", "22", "23", "24"), class = "factor"), sex_male = c(0, 0, 
                                                                                                                                                                                                                              1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 
                                                                                                                                                                                                                              0, 0, 0, 1, 0, 0, 0), momage = c(29, 34, 34, 25, 31, 33, 24, 
                                                                                                                                                                                                                                                               26, 21, 30, 33, 41, 27, 32, 32, 30, 26, 32, 37, 20, 37, 29, 37, 
                                                                                                                                                                                                                                                               31, 33, 27, 32, 33, 37, 33), momed = c("4", "2", "4", "2", "4", 
                                                                                                                                                                                                                                                                                                      "4", "3", "4", "2", "4", "2", "3", "2", "4", "4", "3", "4", "2", 
                                                                                                                                                                                                                                                                                                      "4", "2", "4", "2", "4", "2", "2", "3", "3", "4", "3", "4"), 
                         income = c("4", "1", "4", "3", "4", "4", "2", "4", "2", "4", 
                                    "4", "4", "3", "4", "4", "4", "4", "1", "4", "3", "4", "5", 
                                    "4", "5", "4", "5", "3", "4", "4", "3")), row.names = c(NA, 
                                                                                            -30L), class = "data.frame", na.action = structure(c(`10` = 10L, 
                                                                                                                                                 `11` = 11L, `26` = 26L, `27` = 27L, `28` = 28L, `29` = 29L, `30` = 30L, 
                                                                                                                                                 `31` = 31L, `32` = 32L, `33` = 33L, `42` = 42L, `53` = 53L, `66` = 66L, 
                                                                                                                                                 `81` = 81L, `84` = 84L, `94` = 94L, `99` = 99L, `106` = 106L, 
                                                                                                                                                 `113` = 113L, `115` = 115L, `118` = 118L, `119` = 119L, `125` = 125L, 
                                                                                                                                                 `133` = 133L, `137` = 137L, `147` = 147L, `155` = 155L, `165` = 165L, 
                                                                                                                                                 `170` = 170L, `171` = 171L, `188` = 188L, `189` = 189L, `192` = 192L, 
                                                                                                                                                 `199` = 199L, `206` = 206L, `207` = 207L, `208` = 208L, `212` = 212L, 
                                                                                                                                                 `213` = 213L, `215` = 215L, `217` = 217L, `220` = 220L, `228` = 228L, 
                                                                                                                                                 `239` = 239L, `242` = 242L, `245` = 245L, `246` = 246L, `250` = 250L, 
                                                                                                                                                 `255` = 255L, `256` = 256L, `267` = 267L), class = "omit"))

Created on 2024-07-13 with reprex v2.1.0


Solution

  • You forgot to add the data frame for year 2021.

    data <- bind_rows(df_2019, df_2020, df_2021, .id="year") |>
      mutate(year=+(year==1)) # 1=2019 (treated), 0=2020+2021 (controls)
    

    Also, in your matchit call, you probably want to change the ratio argument (the default is 1:1), because you now have more data to match.

    library(MatchIt)
    
    match_obj <- matchit(year ~ asqse_quest+year_completed_cat+sex_male+momage+momed+income,
                         data = data, 
                         ratio=2,
                         exact= ~ momed+income,
                         method = "optimal")
    summary(match_obj)
    

    Sample Sizes:
                  Control Treated
    All             60.        30
    Matched (ESS)   36.76      25
    Matched         41.        25
    Unmatched       19.         5
    Discarded        0.         0
    

    If you're still getting too few matches, and you don't care about exact matches, then omit the ones you don't care about in the call.

    match_obj <- matchit(year ~ asqse_quest+year_completed_cat+sex_male+momage+momed+income,
                         data = data, 
                         ratio=2,
                         exact= ~ momed, # + income
                         method = "optimal")