Search code examples
rlookup-tablesqdap

r lookup function returning negative dates


I am new to R.

I have a table with some missing data that I would like to update from a reference table.

Sample data table:

df1=data.frame(id=c(1:5),dob=as.Date(c("1/1/2001"
     ,"2/2/2002",NA,NA,NA),"%m/%d/%Y"),other_data=paste0("data",seq(1:5)))

Sample lookup table:

 bd_ref<-data.frame(id=c(1:100),dob=as.Date(rep("1/1/1999"),"%m/%d/%Y"))

Results should be:

  id dob             other_data
1  1 2001-01-01      data1
2  2 2002-02-02      data2
3  3 1999-01-01      data3
4  4 1999-01-01      data4
5  5 1999-01-01      data5

I first identified the missing data and then tried to use the lookup function from the qdapTools package based on this answer Simple lookup to insert values in an R data frame as follows:

df1[is.na(df1$dob),"dob"]<-df1[is.na(df1$dob),"id"] %l% d_ref[,c("id","dob")]

but got the error:

Error in as.Date.numeric(value) : 'origin' must be supplied

It looks like the results of df1[is.na(df1$dob),"id"] %l% d_ref[,c("id","dob")] were not dates but negative numbers

[1] -719144 -719144 -719144

Is this the correct approach in general to solve this problem? If so, any idea why the negative numbers are being returned and what i can do to fix it? If not, any suggestions for the correct approach.


Solution

  • You can try something like this with library dplyr. I suggest you execute each line and see what is going on with the steps.

    library(dplyr)
    df <- inner_join(df1, bd_ref, by = 'id')
    df$dob.x <- as.Date(ifelse(!is.na(df$dob.x), df$dob.x, df$dob.y), origin = '1970-01-01')
    df <- select(df, -dob.y)
    names(df)[2] <- 'dob'
    df
      id        dob other_data
    1  1 2001-01-01      data1
    2  2 2002-02-02      data2
    3  3 1999-01-01      data3
    4  4 1999-01-01      data4
    5  5 1999-01-01      data5