Search code examples
rlistdataframesubset

Select rows based on column value in a list of dataframes


I have a list of dataframes and each one looks like this:

df1:

Name X Y
AAA 10 5
AAA 20 10
AAA 30 15
AAA 40 20

df2:

Name X Y
BBB 20 10
BBB 30 15
BBB 40 20

df3:

Name X Y
CCC 10 5
CCC 20 10
CCC 30 15
CCC 40 20

And I have another dataframe like this:

ID Name
1 AAA
2 CCC
3 FFF

I would like to extract the dataframes from the list that have the same names as the last dataframe. So, in this case, I would get only df1 and df3.


Solution

  • You can do this in base R using lapply and indexing. Below, the unlist(lapply(ll, function(x) any(x$Name %in% mtch$Name))) tests each nested data frame to see if the name matches names in the provided key, and returns a boolean vector that you can index on.

    ll <- list(df1, df2, df3)
    
    ll[unlist(lapply(ll, function(x) any(x$Name %in% mtch$Name)))]
    

    output:

    [[1]]
      Name  X  Y
    1  AAA 10  5
    2  AAA 20 10
    3  AAA 30 15
    4  AAA 40 20
    
    [[2]]
      Name  X  Y
    1  CCC 10  5
    2  CCC 20 10
    3  CCC 30 15
    4  CCC 40 20
    

    Data:

    df1  <- read.table(text = "Name X   Y
    AAA 10  5
    AAA 20  10
    AAA 30  15
    AAA 40  20", h = T)
    
    df2  <- read.table(text = "Name X   Y
    BBB 20  10
    BBB 30  15
    BBB 40  20", h = T)
    
    df3  <- read.table(text = "Name X   Y
    CCC 10  5
    CCC 20  10
    CCC 30  15
    CCC 40  20", h = T)
    
    mtch <- read.table(text = "ID   Name
           1    AAA
           2    CCC
           3    FFF", h = T)