Search code examples
rdataframesubset

Taking a subset of a main dataset based on the values of another data frame that is a subset of the main data frame


I have these two datasets : df as the main data frame and g as a created data frame

df = data.frame(x = seq(1,20,2),y = letters[1:10] )
df

g = data.frame(xx = c(2,3,4,5,7,8,9) )

and I want to take a subset of the data frame df based on the values xx of the data frame g as follows

m = df[df$x==g$xx,]

but the result is based on the match between the two data frames for the order of the matched values. not the matched values themselves.

output

> m
  x y
2 3 b

I don't what the error I am making.


Solution

  • Maybe you need to use %in% instead of ==

    > df[df$x %in% g$xx,]
      x y
    2 3 b
    3 5 c
    4 7 d
    5 9 e
    

    You can also use inner_join from dplyr:

    library(dplyr)
    df %>% 
      inner_join(g, by = c("x" = "xx"))
    

    intersect can be useful too

    df[intersect(df$x, g$xx),]