Search code examples
rloopscomparisonapply

all vs. all comparisons between two dataframes using apply


I am using R and I have three dataframes with two same columns (ID and timestamps), but different number of rows.

ID       timeStamp
a   2018-04-17 10:47:45
a   2018-04-17 10:47:48
a   2018-04-17 10:47:48
a   2018-04-17 10:47:48
a   2018-04-17 10:49:23
a   2018-04-17 10:50:02
a   2018-04-17 10:51:34
a   2018-04-17 10:51:36
a   2018-04-17 10:51:38


ID       timeStamp
b   2018-04-17 10:32:17
b   2018-04-17 10:46:18
b   2018-04-17 10:47:18
b   2018-04-17 10:49:20
b   2018-04-17 10:52:22
b   2018-04-17 10:55:25
b   2018-04-17 10:57:29

ID       timeStamp
c   2018-04-17 10:32:17
c   2018-04-17 10:46:18
c   2018-04-17 10:47:18
c   2018-04-17 10:49:20
c   2018-04-17 10:52:22
c   2018-04-17 10:55:25

I would like to compare all of the timestamp values in three dataframes and compute points conditional on the number of times observations in dataframe A,B and C are within a specific time range. For example, If two obs are within 5 mins range, I want to assign 10 points. If the values are exactly same, it will get 5 points. Otherwise, no point will be added. I would like to get the total points between two dataframes

I tried to make the model using for loop, but it takes so long when I compare huge number of rows.

m= 0
n= 0
for (i in 1:nrow(A)){
  for (j in 1:nrow(B)){if (difftime(A[i,"tStamp"],B[j,"tStamp"],units = "secs") < 300 & A(Role1[i,"tStamp"],B[j,"tStamp"],units = "secs") >0 ) {m=m+10}
else if ( difftime(A[i,"tStamp"],B[j,"tStamp"],units = "secs") == 0){m=m+5}
else if (difftime(B[j,"tStamp"],A[i,"tStamp"],units = "secs") < 300 & difftime(B[j,"tStamp"],A[i,"tStamp"],units = "secs") >0) {n=n+10}
else if ( difftime(B[j,"tStamp"],A[i,"tStamp"],units = "secs") == 0){n=n+5}}

Would there be a good way to do this using apply function? I believe it would be much more efficient & faster than for loop. The expected output will be like

ID1     ID2               m              n
A        B     
A        C      
B        C   

m and n will be the total added points for the relationship of two dataframes after the loop. Any help would be appreciated.


Solution

  • I am not sure what kind of final output you are looking for but to start with you can use outer with difftime.

    mat <- outer(df1$timeStamp, df2$timeStamp, difftime, units = 'mins')
    mat
    
    #Time differences in mins
    #      [,1] [,2] [,3]  [,4]   [,5]  [,6]  [,7]
    # [1,] 15.5 1.45 0.45 -1.58 -4.617 -7.67 -9.73
    # [2,] 15.5 1.50 0.50 -1.53 -4.567 -7.62 -9.68
    # [3,] 15.5 1.50 0.50 -1.53 -4.567 -7.62 -9.68
    # [4,] 15.5 1.50 0.50 -1.53 -4.567 -7.62 -9.68
    # [5,] 17.1 3.08 2.08  0.05 -2.983 -6.03 -8.10
    # [6,] 17.8 3.73 2.73  0.70 -2.333 -5.38 -7.45
    # [7,] 19.3 5.27 4.27  2.23 -0.800 -3.85 -5.92
    # [8,] 19.3 5.30 4.30  2.27 -0.767 -3.82 -5.88
    # [9,] 19.4 5.33 4.33  2.30 -0.733 -3.78 -5.85
    

    From here you might want to take absolute of the value available and turn values less than 5 to 10 and perform the calculations accordingly.