Search code examples
rdataframeperformancevectorization

How to multiply each column in a data frame by a different value per column


Consider the following data frame

   x y z
 1 0 0 0
 2 1 0 0
 3 0 1 0
 4 1 1 0
 5 0 0 1
 6 1 0 1
 7 0 1 1
 8 1 1 1
 -------
 x 4 2 1  <--- vector to multiply by 
 

I would like to multiply each column by a seperate value, for example c(4,2,1). Giving:

   x y z
 1 0 0 0
 2 4 0 0
 3 0 2 0
 4 4 2 0
 5 0 0 1
 6 4 0 1
 7 0 2 1
 8 4 2 1

Code:

pw2 <- c(4, 2, 1)
s01  <- seq_len(2) - 1
df  <- expand.grid(x=s01, y=s01, z=s01)
df

for (d in seq_len(3)) df[,d] <- df[,d] * pw2[d]
df

Question: Find a vectorized solution without a for loop (in base R).

Note: that the question Multiply columns in a data frame by a vector is ambiguous because it includes:

  • multiply each row in the data frame column by a different value.
  • multiply each column in the data frame by a different value.

Both queries can be easily solved with a for loop. Here a vectorised solution is explicitly requested.


Solution

  • Use sweep to apply a function on margins of a dataframe:

    sweep(df, 2, pw2, `*`)
    

    or with col:

    df * pw2[col(df)]
    

    output

      x y z
    1 0 0 0
    2 4 0 0
    3 0 2 0
    4 4 2 0
    5 0 0 1
    6 4 0 1
    7 0 2 1
    8 4 2 1
    

    For large data frames, check collapse::TRA, which is 10x faster than any other answers (see benchmark):

    collapse::TRA(df, pw2, "*")
    

    Benchmark:

    bench::mark(sweep = sweep(df, 2, pw2, `*`),
                col = df * pw2[col(df)],
                '%*%' = setNames(
                  as.data.frame(as.matrix(df) %*% diag(pw2)), 
                  names(df)
                ), 
                TRA = collapse::TRA(df, pw2, "*"), 
                mapply = data.frame(mapply(FUN = `*`, df, pw2)),
                apply = t(apply(df, 1, \(x) x*pw2)), 
                t = t(t(df)*pw2), check = FALSE,
                )
    
    # A tibble: 7 × 13
      expression      min  median itr/s…¹ mem_al…² gc/se…³ n_itr  n_gc total…⁴
      <bch:expr> <bch:tm> <bch:t>   <dbl> <bch:by>   <dbl> <int> <dbl> <bch:t>
    1 sweep       346.7µs 382.1µs   2427.   1.23KB   10.6   1141     5 470.2ms
    2 col         303.1µs 330.4µs   2760.     784B    8.45  1307     4 473.5ms
    3 %*%          72.8µs  77.9µs  11861.     480B   10.6   5599     5 472.1ms
    4 TRA             5µs   5.5µs 167050.       0B   16.7   9999     1  59.9ms
    5 mapply      117.6µs 127.9µs   7309.     480B   10.6   3442     5 470.9ms
    6 apply       107.8µs 117.9µs   7887.   6.49KB   12.9   3658     6 463.8ms
    7 t            55.3µs  59.7µs  15238.     720B    8.13  5620     3 368.8ms