Search code examples
rdataframeduplicatessubsetintersect

Is there a function in R that will let me create a new data frame that contains the duplicated values from the first data frame?


This is my example. From this data frame I want to create a new data frame that contains the rows that based on matches in column, mgb and column, tsg thereby omitting the other rows.

mbr  mbg tsr tsg
1   1   g1   3  g4
2   2   g2   4  g3
3   3   g3   5  g2
4   4   g4   6  g1
5   5   g5   7  g5
6  NA <NA>   1  g6
7  NA <NA>   2  g7

So ideally it would return this data frame:

mbr  mbg tsr tsg
1   1   g1   3  g4
2   2   g2   4  g3
3   3   g3   5  g2
4   4   g4   6  g1
5   5   g5   7  g5

So far I've tried:

1) intersect(df$mbg,df$tsg) but that only returns a lists of the matches between the columns e.g. g1, g2 etc...

2) df2<-[intersect(df$mbg,df$tsg),]

which returns this:

     mbr  mbg tsr  tsg
NA    NA <NA>  NA <NA>
NA.1  NA <NA>  NA <NA>
NA.2  NA <NA>  NA <NA>
NA.3  NA <NA>  NA <NA>
NA.4  NA <NA>  NA <NA>

I'm very new to R and trying to teach myself so any advice would be amazing. Thank you!


Solution

  • Assuming I'm interpreting what you're looking for correctly, you appear to be on the right track, just running into issues with syntax. Try this

    df2<-df[df$mbg %in% intersect(df$mbg,df$tsg),]
    

    intersect(df$mbg, df$tsg) was returning the values that occur in both of those columns. Adding df before the brackets identifies the data frame you want a subset of, which you were missing before, and the df$mbg %in% part says that you want the rows where the value of mbg is included included in the intersection.