Search code examples
rdataframeunique

Finding common rows in R


While trying to get my data fit for analysis, I can't seem to do this correctly. Presume I have a datasets in this form:

df1

V1  V2df1
a   H
b   Y
c   Y

df2

V1  V2df2
a   Y
j   H
b   Y

and three more (5 datasets of different lengths alltogether). What I am trying to do is the following. First I must find all common elements from the first column(V1) - in this case those are: a,b. Then according to those common elements, I'm trying to build a joined dataset, where values of V1 would be common to all five datasets and values from other columns would be appended in the same row. So to explain with an example, my result should look something like:

V1  V2df1  V2df2
a   H      Y
b   Y      Y

I managed to get some code working, but apperently the results are not correct. What I did: read all the lines from all files into variables(example: a<-df1[,1] and so on) and find common rows like:

red<-Reduce(intersect, list(a,b,c,d,e))

then I filtered specific datasets like:

df1 <-  unique(filter(df1, V1 %in% red))

I ordered every dataset according to row:

df1<-data.frame(df1[with(df1, order(V1)),])

and deleted duplicates(of elements in first column):

df1<- df1[unique(df1$V1),]

I then created a new dataset with:

newdata<-data.frame(V1common=df1[,1], V2df1=df1[,2],V2df2=df2[,2]...)

... means for all five of datasets. I actually got the same number of rows(a good sign since there are the same number of rows within intersection), and then appended other sorted columns, but something doesn't add up. Thanks for any advice. (I omitted the use of libraries and such, the code is for illustrative purposes).


Solution

  • You can use join_all from plyr package

    require(plyr)
    df <- join_all(list(df1,df2,df3,df4, df5), by = 'V1', type = 'inner')