Search code examples
rdatasetconditional-statementsmultiple-databases

integrate databases of different rows by multiple conditions in r


I tried merge and a all series of for/if loops of which the best I will report. I read several posts but I could not find any that does quite match.

I have 2 databases, one of 360 rows and the other one of 60 rows. I would like to add some columns present in the smaller one to the bigger one by four condition repeating the same number by another condition so to have a 360 rows dataset.

familiarity    pb_type  sex trial   lower      upper       fit
 mate          tet      m     1     1.760949   3.780915   2.809002
 familiar      tet      m     1     2.020926   3.986183   3.021357
 unfamiliar    tet      m     1     2.570472   4.499613   3.530639
 mate          stack    m     1     3.479230   5.441066   4.500652
 familiar      stack    m     1     2.934518   4.89067    3.904378

"familiarty", "pb_type", "sex" and "trial" are my conditions to select the rows and creates uniques combinations.

I would like to add the other columns "lower", "upper", and "fit" to my bigger dataset. Each of this row has to be repeated 6 times following the condition "id" that in my bigger database has

I cannot use rep or so because the order of the conditions in different in the 2 dataset (e.g. in the familiarity column "mate", does not comes first in both)

Here is what I tried: the big dataset is "raw data", the small is "simulation"

max_count <- length(raw_data[,1])
count = 1
raw_data$lower <- NA
raw_data$upper <- NA
raw_data$mean <- NA

for(i in 1:length(simulation[,1])){    

    if(count<=max_count)
 {
     j<-count

     while(raw_data[j,3] == simulation[i,3] && raw_data[j,4] == simulation[i,4]&& raw_data[j,7] == simulation[i,2] && raw_data[j,8] == simulation[i,1]){


         raw_data$lower[[j]] <- simulation$lower[[i]]
         raw_data$upper[[j]] <- simulation$upper[[i]]
         raw_data$mean[[j]] <-  simulation$fit[[i]]   

     }
     count <-count+1

  }

} 

Unfortunately it goes into a infinite loop always at the same point, I think because of the different order ot the conditions.

Unfortunately I am not good with the package dplyr...that might be the solution.

I realize that the question is long and complicated, please help me in refine it!

thanks for any input all the best


Solution

  • If I'm understanding your question correctly, you want to join using combinations of the first four variables of the data table you've shown as the key? Please clarify if this is not the case, and it might help to see the other data table you are trying to merge.

    That said, is this what you want?

    library(dplyr)
    left_join(raw_data, simulation, by = c("familiarity","pb_type","sex","trial"))
    

    It may not be necessary to specify the join variables depending on what your other data table looks like, but it can't hurt.