Search code examples
rfunctiondataframetidyversearithmetic-expressions

How to create a function to conditionally execute arithmetic operations in multiple columns


Given the sample data sampleDT below, I would appreciate any help to create a function that efficiently does the following:

For each variable whose name begins with dollar:

  • do 3-(5/j) in those rows where sampleDT$employer==1 ;

  • do 2*j in those rows where sampleDT$employer==0;

  • put the result of the operation in a new variable located in the column next to the one where it was based;

  • keep the values of dollar.wage_1 unchanged;

  • put the output of the operation in the new variable euro.wage_x whose name only replaces dollar by euro in the source variable dollar.wage_x. x is the number of dollar.wage variables.

  • create new variables named division.wage_x which contain for each pair dollar.wage_x and euro.wage_x the result of division of dollar.wage_x by euro.wage_x.

Where j stands for the values that the variables dollar.wage_1:dollar.wage_10 take.


Sample data

sampleDT<-structure(list(id = 1:10, N = c(10L, 10L, 10L, 10L, 10L, 10L, 
    10L, 10L, 10L, 10L), A = c(62L, 96L, 17L, 41L, 212L, 143L, 143L, 
    143L, 73L, 73L), B = c(3L, 1L, 0L, 2L, 170L, 21L, 0L, 33L, 62L, 
    17L), C = c(0.05, 0.01, 0, 0.05, 0.8, 0.15, 0, 0.23, 0.85, 0.23
    ), employer = c(1L, 1L, 0L, 1L, 0L, 1L, 1L, 0L, 0L, 0L), F = c(0L, 
    0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L), G = c(1.94, 1.19, 1.16, 
    1.16, 1.13, 1.13, 1.13, 1.13, 1.12, 1.12), H = c(0.14, 0.24, 
    0.28, 0.28, 0.21, 0.12, 0.17, 0.07, 0.14, 0.12), dollar.wage_1 = c(1.94, 
    1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_2 = c(1.93, 
    1.18, 3.15, 3.15, 1.12, 1.12, 2.12, 1.12, 1.11, 1.11), dollar.wage_3 = c(1.95, 
    1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.13, 1.13), dollar.wage_4 = c(1.94, 
    1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_5 = c(1.94, 
    1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_6 = c(1.94, 
    1.18, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_7 = c(1.94, 
    1.19, 3.16, 3.16, 1.14, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_8 = c(1.94, 
    1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_9 = c(1.94, 
    1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12), dollar.wage_10 = c(1.94, 
    1.19, 3.16, 3.16, 1.13, 1.13, 2.13, 1.13, 1.12, 1.12)), row.names = c(NA, 
    -10L), class = "data.frame")

Head output

id N A  B  C   employer F G    H      dollar.wage_1 dollar.wage_2 dollar.wage_3 dollar.wage_4 dollar.wage_5 dollar.wage_6 dollar.wage_7 dollar.wage_8 dollar.wage_9 dollar.wage_10
1 10 62 3 0.05        1 0 1.94 0.14          1.94          1.93          1.95          1.94          1.94          1.94          1.94          1.94          1.94           1.94
2 10 96 1 0.01        1 0 1.19 0.24          1.19          1.18          1.19          1.18          1.19          1.18          1.19          1.19          1.19           1.19
3 10 17 0 0.00        0 0 1.16 0.28          3.16          3.15          3.16          3.16          3.16          3.16          3.16          3.16          3.16           3.16

I am looking for an efficient way to do this because my actual dataset has over 1000 variables dollar.wage_x, where x > 1000.

Thanks in advance for any help.


Solution

  • Using data.table:

    library(data.table)
    setDT(sampleDT)
    o_cols <- grep("^dollar", names(sampleDT), value = TRUE)
    n_cols <- sub("^dollar", "euro", o_cols)
    sampleDT[, (n_cols) := lapply(.SD, function(j) ifelse(employer == 1, 3 - 5 / j, 2 * j)), .SDcols = o_cols]
    
    
    
    > sampleDT
        id  N   A   B    C employer F    G    H dollar.wage_1 dollar.wage_2 dollar.wage_3 dollar.wage_4 dollar.wage_5 dollar.wage_6 dollar.wage_7
     1:  1 10  62   3 0.05        1 0 1.94 0.14          1.94          1.93          1.95          1.94          1.94          1.94          1.94
     2:  2 10  96   1 0.01        1 0 1.19 0.24          1.19          1.18          1.19          1.18          1.19          1.18          1.19
     3:  3 10  17   0 0.00        0 0 1.16 0.28          3.16          3.15          3.16          3.16          3.16          3.16          3.16
     4:  4 10  41   2 0.05        1 0 1.16 0.28          3.16          3.15          3.16          3.16          3.16          3.16          3.16
     5:  5 10 212 170 0.80        0 0 1.13 0.21          1.13          1.12          1.14          1.13          1.14          1.13          1.14
     6:  6 10 143  21 0.15        1 1 1.13 0.12          1.13          1.12          1.13          1.13          1.13          1.13          1.13
     7:  7 10 143   0 0.00        1 1 1.13 0.17          2.13          2.12          2.13          2.13          2.13          2.13          2.13
     8:  8 10 143  33 0.23        0 1 1.13 0.07          1.13          1.12          1.13          1.13          1.13          1.13          1.13
     9:  9 10  73  62 0.85        0 1 1.12 0.14          1.12          1.11          1.13          1.12          1.12          1.12          1.12
    10: 10 10  73  17 0.23        0 1 1.12 0.12          1.12          1.11          1.13          1.12          1.12          1.12          1.12
        dollar.wage_8 dollar.wage_9 dollar.wage_10 euro.wage_1 euro.wage_2 euro.wage_3 euro.wage_4 euro.wage_5 euro.wage_6 euro.wage_7 euro.wage_8 euro.wage_9
     1:          1.94          1.94           1.94   0.4226804   0.4093264   0.4358974   0.4226804   0.4226804   0.4226804   0.4226804   0.4226804   0.4226804
     2:          1.19          1.19           1.19  -1.2016807  -1.2372881  -1.2016807  -1.2372881  -1.2016807  -1.2372881  -1.2016807  -1.2016807  -1.2016807
     3:          3.16          3.16           3.16   6.3200000   6.3000000   6.3200000   6.3200000   6.3200000   6.3200000   6.3200000   6.3200000   6.3200000
     4:          3.16          3.16           3.16   1.4177215   1.4126984   1.4177215   1.4177215   1.4177215   1.4177215   1.4177215   1.4177215   1.4177215
     5:          1.13          1.13           1.13   2.2600000   2.2400000   2.2800000   2.2600000   2.2800000   2.2600000   2.2800000   2.2600000   2.2600000
     6:          1.13          1.13           1.13  -1.4247788  -1.4642857  -1.4247788  -1.4247788  -1.4247788  -1.4247788  -1.4247788  -1.4247788  -1.4247788
     7:          2.13          2.13           2.13   0.6525822   0.6415094   0.6525822   0.6525822   0.6525822   0.6525822   0.6525822   0.6525822   0.6525822
     8:          1.13          1.13           1.13   2.2600000   2.2400000   2.2600000   2.2600000   2.2600000   2.2600000   2.2600000   2.2600000   2.2600000
     9:          1.12          1.12           1.12   2.2400000   2.2200000   2.2600000   2.2400000   2.2400000   2.2400000   2.2400000   2.2400000   2.2400000
    10:          1.12          1.12           1.12   2.2400000   2.2200000   2.2600000   2.2400000   2.2400000   2.2400000   2.2400000   2.2400000   2.2400000
        euro.wage_10
     1:    0.4226804
     2:   -1.2016807
     3:    6.3200000
     4:    1.4177215
     5:    2.2600000
     6:   -1.4247788
     7:    0.6525822
     8:    2.2600000
     9:    2.2400000
    10:    2.2400000