Search code examples
rcalculated-columns

How to create a calculated column using a lookup table and a formula in R?


I have a dataframe that looks like this:

   V1  V2   V3
   3    4   3
   2    4   3
   4    4   3
   4    4   4
   1    4   2
   4    2   4
   4    4   1
   4    4   2
   3    4   1
   4    4   4
   4    4   2
   4    4   2
   2    1   2
   3    2   3
   3    4   3
   3    4   2
   4    4   2
   4    4   4
   2    3   3
   3    4   1

I also have a lookup table like this:

 V_id   coeff  weight 
  V1   0.82     4.77   
  V2   0.75     4.77   
  V3   0.67     4.77

I want to use these values in the lookup table to create a new calculated column in DF1 using

(V1*coeff+V2*coeff+V3*coeff)/weight

The final dataframe should look like this.

   V1   V2  V3  new_column
   3    4   3   1.566037736
   2    4   3   1.394129979
   4    4   3   1.737945493
   4    4   4   1.878406709
   1    4   2   1.081761006
   4    2   4   1.5639413
   4    4   1   1.457023061
   4    4   2   1.597484277
   3    4   1   1.285115304
   4    4   4   1.878406709
   4    4   2   1.597484277
   4    4   2   1.597484277
   2    1   2   0.78197065
   3    2   3   1.251572327
   3    4   3   1.566037736
   3    4   2   1.42557652
   4    4   2   1.597484277
   4    4   4   1.878406709
   2    3   3   1.236897275
   3    4   1   1.285115304

I have to do this for a data frame with 1125 columns.


Solution

  • Edit Updated answer to updated question (data frame has 1,125 columns):

    df1_V <- as.matrix(df1) # or select the "V" columns using df1[, 1:1125]
    df1$new_column <- df1_V %*% df2$coef / df2$weight[1]
    

    This is a general solution which will work for any number of columns as long as the columns of df1 are arranged in the same manner (across the columns of the data frame) as the coef values are ordered (row-wise) in df2, and the number of columns in df1 equals the number of rows in df2, that is ncol(df1_V) = nrow(df2).


    Answer (to original question):

    library(dplyr)
    
    df %>%
      mutate(new_column = (V1*0.82 + V2*0.75 + V3*0.67) / 4.77)
    

       V1 V2 V3 new_column
    1   3  4  3  1.5660377
    2   2  4  3  1.3941300
    3   4  4  3  1.7379455
    4   4  4  4  1.8784067
    5   1  4  2  1.0817610
    6   4  2  4  1.5639413
    ...
    

    Alternative:

    df1$new_column <- as.matrix(df1) %*% c(0.82, 0.75, 0.67) / 4.77