Search code examples
rdataframesubsetimputation

How to subset multiple data frames by a variable?


I have an R dataset called "imps" that contains multiple imputed datasets within it: enter image description here

Within each of those data frames, there is a column (or variable) for gender (where gender=1 or gender=0).

I'm trying to figure out if there's a way for me to re-subset "imps" where all the data frames within it only contain observations depending on whether gender=1 or gender=0.

I understand how to do this if I only pick say one of those data frames, from which then I can run the subset function (i.e.):

imputed_data1 <- imps[[5]] #selecting the 5th imputed dataset

imputed_gender <- subset(imputed_data1, gender==1)

My issue is that I'm trying to keep all the data frames (there's hundreds of them), but I want to go inside each of them and only select observations where gender=1 or gender=0.

Is this possible to do? Any help would be much appreciated.


Solution

  • We can wrap with lapply

    imps1 <- lapply(imps, subset, subset = gender == 1)
    imps0 <- lapply(imps, subset, subset = gender == 0)
    

    Or using tidyverse

    library(dplyr)
    library(purrr)
    imps1 <- map(imps, ~ .x %>%
                       filter(gender == 1))