Search code examples
rmergedata-cleaning

R: if values of two columns of two data frames are equal, add column of one of the two data frames


I have two data frames

columns df1= a, b, c, d

columns df2= e, f, c, d

I want to say that if df1$a == df2$e and at the same time df1$b == df2$f I want to print in a new column df1$x the values of the corresponding df2$c.

So I wrote this command df1$x <- ifelse(df1$a %in% df2$e & df1$b %in% df2$f, df2$c, ifelse (NULL))

But the values in df1x are not correct. I think I have to do a while loop, but I have no idea from where to start.

Nicoletta


Solution

  • To merge two data frames by multiple columns you can use merge and corresponding by.x and by.y. Both of those accept vector of column names. Parameter all.x means all data from first df1 will be preserved.

    df1 <-  merge(df1, df2[,c("e","f","c")], by.x=c("a","b"), by.y=c("e","f"), all.x=T)
    
    colnames(df1) <- c("a","b","c","d","x")
    

    For more information see ?merge. For more complex operation you can check out package dplyr and its function left_join.

    Result on random data

    df1 <- data.frame(a = sample(1:5), b= sample(1:5), c=runif(5), d = runif(5))
    df2 <- data.frame(e = df1$a, f = df1$b, c = runif(5), d = runif(5))
    df2$e[3] <- 7
    

    Output

    > df1
      a b          c          d
    1 5 4 0.76677063 0.92123552
    2 4 1 0.93524320 0.09275425
    3 3 2 0.01121468 0.12035981
    4 1 5 0.72992427 0.87711572
    5 2 3 0.11680937 0.93696597
    > df2$e[3] <- 7
    > df2
      e f         c         d
    1 5 4 0.6251662 0.1549575
    2 4 1 0.8464672 0.8702837
    3 7 2 0.5394273 0.4290171
    4 1 5 0.4061817 0.9072905
    5 2 3 0.3376456 0.4291463
    > merge(df1, df2[,c("e","f","c")], by.x=c("a","b"), by.y=c("e","f"), all.x=T)
      a b          c          d         x
    1 1 5 0.72992427 0.87711572 0.4061817
    2 2 3 0.11680937 0.93696597 0.3376456
    3 3 2 0.01121468 0.12035981        NA
    4 4 1 0.93524320 0.09275425 0.8464672
    5 5 4 0.76677063 0.92123552 0.6251662