Search code examples
rvariablesnew-operator

How to add a new column to a data frame that is the -ln of the variable "hr" if the two "age" variables in the two dfs match in R?


My goal is to create a new column "HoLj" in df1 that is the -ln of "hr" from df2 if the corresponding age in df1, matches the age2 in df2.

df1<- data.frame(age = c("1","2","4","5","7","8"), dif = c("y", "n", "y", "n","n","y")

df2<- data.frame(age2=c("1","2","3","4","5","6","7","8"),hr=c(56, 57, 23, 46, 45, 19, 21, 79)

My goals is to create a new column in df1 that looks like below:

age   dif    hoLj
1      y      -ln(56)
2      n      -ln(57)
4      y      -ln(46)
5      n      -ln(45)
7      n      -ln(21)
8      y      -ln(79)

Thank you!


Solution

  • We can do a join and then get the natural log

    library(dplyr)
    left_join(df1, df2) %>%
         mutate(hoLj = -log(hr)) %>%
         select(-hr)
    

    Or with data.table

    library(data.table)
    setDT(df1)[df2, hoLj := -log(hr), on = .(age)]
    df1
    #   age dif      hoLj
    #1:   1   y -4.025352
    #2:   2   n -4.043051
    #3:   4   y -3.828641
    #4:   5   n -3.806662
    #5:   7   n -3.044522
    #6:   8   y -4.369448