Search code examples
rmatchsubset

subset-friendly version of match()


I dislike (or don't understand) the match() function because it outputs NA if it doesn't find a value and because it returns only the first occurrence of a match. Both is bad for subsetting one dataframe by another, at least in my situation.

Let the data be

df1 <- data.frame(id=c(1:3,3:4), val= 11:15)
df2 <- data.frame(id= c(2,0,3,1), val=91:94)

I want to subset df1 by the id's of df2. So my desired output is

df_expected <- data.frame(id= c(2,3,3,1), val= c(12,13,14,11))

But if I try to subset df1 using df1[match(df2$id, df1$id), ] I get not what I expect, firstly because match returns a NA since it can't find id 0 in df1 which adds a row containing just NAs and secondly because it returns just the first occurrence of id 3 in df1 but I want to have all occurrences of a match.

How to adjust the match function so that it does as described above?


Solution

  • You can use a combination of %in% and which. The output is not in the order you posted, though.

    df1 <- data.frame(id=c(1:3,3:4), val= 11:15)
    df2 <- data.frame(id= c(2,0,3,1), val=91:94)
    
    i <- which(df1$id %in% df2$id)
    df1[i, ]
    #>   id val
    #> 1  1  11
    #> 2  2  12
    #> 3  3  13
    #> 4  3  14
    

    Created on 2023-04-22 with reprex v2.0.2


    Edit

    The following will give the posted order.

    df1 <- data.frame(id=c(1:3,3:4), val= 11:15)
    df2 <- data.frame(id= c(2,0,3,1), val=91:94)
    
    i <- df1$id[df1$id %in% df2$id]
    j <- df2$id[df2$id %in% df1$id]
    df1[order(match(i, j)), ]
    #>   id val
    #> 2  2  12
    #> 3  3  13
    #> 4  3  14
    #> 1  1  11
    

    Created on 2023-04-22 with reprex v2.0.2


    Edit 2

    After reading Ingo Pingo's and TarJae's comments I have changed the above solutions to

    1

    i <- which(df1$id %in% df2$id)
    df1[order(match(df1$id[i], df2$id)),]
    #>   id val
    #> 2  2  12
    #> 3  3  13
    #> 4  3  14
    #> 1  1  11
    

    Created on 2023-04-22 with reprex v2.0.2

    2

    j <- df1$id[df1$id %in% df2$id]
    df1[order(match(j, df2$id)), ]
    #>   id val
    #> 2  2  12
    #> 3  3  13
    #> 4  3  14
    #> 1  1  11
    

    Created on 2023-04-22 with reprex v2.0.2