Search code examples
rdplyrtidyrdata-munging

How to create multiple calculated columns on the value of one colum which refers to other columns in R?


I am trying to create a time series which shows what the values of a specific Column was at a particular time. All I currently have access to is a table which logs all the changes, the current value of the columns, dates and the names of the column which was altered. I would like to create a new column which tracks what the previous value of the column was before it was changed. There are over 63 distinct columns in the change log referenced in ‘Column_name’

This is what I currently have

________________________________________________
Name |  date    |A  | B  |C  |NEW | Column_name|
bob  |  12302019|2  | 23 |153|2   | a          | 
bob  |  12102019|2  | 23 |153|362 | a          |
bob  |  10242019|2  | 23 |153|7   | a          |
john |  10062017|684| 452|1  |254 | c          |
john |  11052018|684| 452|1  |1   | c          |
________________________________________________

This is what I would like help Creating

_________________________________________________________________________________
Name |  date    |A  | B  |C  |NEW | Column_name| a_ at Date| b_ at Date | c_ at Date |
bob  |  12302019|2  | 23 |153|2   | a          |2          | 23         | 153        |
bob  |  12102019|2  | 23 |153|362 | a          |362        | 23         | 153        |
bob  |  10242019|2  | 23 |153|7   | a          |7          | 23         | 153        |
john |  10062017|684| 452|1  |254 | c          |684        | 452        | 254        | 
john |  11052018|684| 452|1  |1   | c          |684        | 452        | 1          | 
______________________________________________________________________________________     
I have tested the solution on the following test Data frame, where there is only one column Name "A" and it has several factors 
'data.frame':   755 obs. of  5 variables:
 $ name       : int  606765182 83595892 538663788 779873188 957405600 522796409 41212559 145402647 304688204 83595892 ...
 $ date       : POSIXct, format: "2019-11-01" "2019-11-01" "2019-10-21" ...
 $ A          : Factor
 $ B          : Factor
 $ C          : Factor

 $ Column_name: Factor w/ 1
 $ NEW        : Factor w/ 8 

Solution

  • base R
    This is a base R solution. It uses sapply/ifelse to create a matrix with the new values, then cbind's it with the input dataframe df1.

    cols_to_change <- c("A", "B", "C")
    
    tmp <- sapply(cols_to_change, function(x){
      x2 <- tolower(x)
      y <- tolower(df1[["Column_name"]])
      ifelse(x2 == y, df1[["NEW"]], df1[[x]])
    })
    colnames(tmp) <- paste0(colnames(tmp), "_new")
    df2 <- cbind(df1, tmp)
    
    rm(tmp)    # final cleanup
    

    dplyr solution.

    newcol <- function(x, DF){
      x <- deparse(substitute(x))
      x2 <- tolower(x)
      y <- tolower(DF[["Column_name"]])
      ifelse(x2 == y, DF[["NEW"]], DF[[x]])
    }
    
    df1 %>%
      mutate_at(vars(cols_to_change), 
                .funs = funs(new=newcol(., df1)))
    

    Data.

    df1 <- 
    structure(list(Name = c("bob", "bob", "bob", "john", "john"), 
    date = c(12302019L, 12102019L, 10242019L, 10062017L, 11052018L),
    A = c(2, 2, 2, 684, 684), B = c(23, 23, 23, 452, 452), 
    C = c(153, 153, 153, 1, 1), NEW = c(2, 362, 7, 254, 1), 
    Column_name = c("a", "a", "a", "c", "c")), 
    row.names = c(NA, -5L), class = "data.frame")