Search code examples
rdataframedplyrsemi-join

Using semi_join to find similarities but returns none mistakenly


I am trying to find the similar genes between two columns that I can later work with just the similar genes. Below is my code:

top100_1Beta <- data.frame(grp1_Beta$my_data.SYMBOL[1:100])
top100_2Beta<- data.frame(grp2_Beta$my_data.SYMBOL[1:100])
common100_Beta <- semi_join(top100_1Beta,top100_2Beta)`

When I run the code I get the following error:

Error: by required, because the data sources have no common variables

This is wrong since when I open top100_1Beta and top100_2Beta I can see at least the first few list the exact same genes: ATP2A1, SLMAP, MEOX2,...

I am confused on why then it's returning that no commonalities. Any help would be greatly appreciated. Thanks!


Solution

  • I don't think you need any form of *_join here; instead it seems you're looking for intersect

    intersect(grp1_Beta$my_data.SYMBOL[1:100], grp2_Beta$my_data.SYMBOL[1:100])
    

    This returns a vector of common entries amongst the first 100 entries of grp1_Beta$my_data.SYMBOL and grp1_Beta$my_data.SYMBOL.