This is my example. From this data frame I want to create a new data frame that contains the rows that based on matches in column, mgb and column, tsg thereby omitting the other rows.
mbr mbg tsr tsg
1 1 g1 3 g4
2 2 g2 4 g3
3 3 g3 5 g2
4 4 g4 6 g1
5 5 g5 7 g5
6 NA <NA> 1 g6
7 NA <NA> 2 g7
So ideally it would return this data frame:
mbr mbg tsr tsg
1 1 g1 3 g4
2 2 g2 4 g3
3 3 g3 5 g2
4 4 g4 6 g1
5 5 g5 7 g5
So far I've tried:
1) intersect(df$mbg,df$tsg)
but that only returns a lists of the matches between the columns e.g. g1, g2 etc...
2) df2<-[intersect(df$mbg,df$tsg),]
which returns this:
mbr mbg tsr tsg
NA NA <NA> NA <NA>
NA.1 NA <NA> NA <NA>
NA.2 NA <NA> NA <NA>
NA.3 NA <NA> NA <NA>
NA.4 NA <NA> NA <NA>
I'm very new to R and trying to teach myself so any advice would be amazing. Thank you!
Assuming I'm interpreting what you're looking for correctly, you appear to be on the right track, just running into issues with syntax. Try this
df2<-df[df$mbg %in% intersect(df$mbg,df$tsg),]
intersect(df$mbg, df$tsg)
was returning the values that occur in both of those columns. Adding df before the brackets identifies the data frame you want a subset of, which you were missing before, and the df$mbg %in%
part says that you want the rows where the value of mbg is included included in the intersection.