Search code examples
rdplyrsubtraction

r multiple column subtraction


The reduced version of my dataset is as shown below.

 Z_dog1_mu1  Z_dog2_mu1  Z_dog3_mu1  Z_cat1_mu1  Z_cat2_mu1   Z_cat3_mu1                                                
 0.0000      0.0000      0.0001      0.0005      0.0043       0.0045   
 0.0039     -0.0016     -0.0102     -0.0009      0.0421      -0.0139
-0.0380     -0.0733      0.0196      0.0261      0.0628       0.0463
-0.1036      0.0784     -0.0529      0.1053     -0.0511      -0.0138

I am trying to substract the dog* columns from the cat* columns like this

 df$diff1 <- df$Z_dog1_mu1   -  df$Z_cat1_mu1
 df$diff2 <- df$Z_dog2_mu1   -  df$Z_cat2_mu1
 df$diff3 <- df$Z_dog3_mu1   -  df$Z_cat3_mu1  

How can I do this more efficiently and faster without manually subtracting each column as shown above. I have around 100 dog columns (Z_dog1_mu1...Z_dog100_mu1) and 100 cat columns(Z_cat1_mu1...Z_cat100_mu1) /. Any advise is much appriciated.


Solution

  • We subset the 'dog' columns, and 'cat' columns separately and then do the subtraction

    nmdog <- grep("^Z_dog\\d+_mu", names(df))
    nmcat <- grep("^Z_cat\\d+_mu", names(df))
    df[paste0("diff", seq_along(nmdog))] <- df[nmdog] - df[nmcat]
    df
    #  Z_dog1_mu1 Z_dog2_mu1 Z_dog3_mu1 Z_cat1_mu1 Z_cat2_mu1 Z_cat3_mu1   diff1   diff2   diff3
    #1     0.0000     0.0000     0.0001     0.0005     0.0043     0.0045 -0.0005 -0.0043 -0.0044
    #2     0.0039    -0.0016    -0.0102    -0.0009     0.0421    -0.0139  0.0048 -0.0437  0.0037
    #3    -0.0380    -0.0733     0.0196     0.0261     0.0628     0.0463 -0.0641 -0.1361 -0.0267
    #4    -0.1036     0.0784    -0.0529     0.1053    -0.0511    -0.0138 -0.2089  0.1295 -0.0391
    

    NOTE: As showed in the example, we assume that the 'dog' column sequence corresponds to the 'cat' column sequence i.e. 1:100

    data

    df <- structure(list(Z_dog1_mu1 = c(0, 0.0039, -0.038, -0.1036), Z_dog2_mu1 = c(0, 
    -0.0016, -0.0733, 0.0784), Z_dog3_mu1 = c(1e-04, -0.0102, 0.0196, 
    -0.0529), Z_cat1_mu1 = c(5e-04, -9e-04, 0.0261, 0.1053), Z_cat2_mu1 = c(0.0043, 
    0.0421, 0.0628, -0.0511), Z_cat3_mu1 = c(0.0045, -0.0139, 0.0463, 
    -0.0138)), class = "data.frame", row.names = c(NA, -4L))