Search code examples
rloopsfor-loopif-statementsapply

loop over 2 datasets in R to match the value of all rows from one dateset with only one column of another dateset


I am trying to write a loop in R to perform some iteration on two datasets called datasetA and datasetB.

datasetA has 600 entries and datasetB has 200’000 entries. For each entry in datasetA, I want to perform the following:

If the value of V2 in both datasets are equal, then calculate the ppm:

(datasetA$V3 - datasetB$V3) / datasetA$V3 * 1000000

If the ppm < |10|, then paste the ppm value in V4 column in datasetB, paste the relevant name of datasetA$V1 in column V1 of datasetB.

Say this is datasetA with 600 entries:

datasetA<- read.table(text='Alex    1   50.00042
John    1   60.000423
Janine    3   88.000123
Aline    3   117
Mark    2    79.9999')

DatasetA

and this is an example of datasetB with 200000 entries:

datasetB<- read.table(text='NA    1   50.0001    NA
NA    1   50.00032    NA
NA    2   70    NA
NA    2   80    NA
NA    3   88.0004    NA
NA    3   100    NA
NA    3   101    NA
NA    2    102    NA')

DatasetB

The final table should look like this:

datasetC <- read.table(text='Alex    1   50.0001    6.459945
Alex    1   50.00032    2.059983
NA    2   70    NA
Mark    2   80    -1.25
Janine    3   88.0004    -3.14772
NA    3   100    NA
NA    3   101    NA
NA    2    102    NA')

The final table should look like this


Solution

  • data<-datasetB
    for(i in 1:5){
      for(j in 1:8){
        if (datasetA$V2[i]==datasetB$V2[j] & abs((datasetA$V3[i]-datasetB$V3[j])/datasetA$V3[i]*10**6)<10){
          data[j,1]=datasetA[i,1]
          data[j,4]=(datasetA$V3[i]-datasetB$V3[j])/datasetA$V3[i]*10**6
      }}}
    data