Consider the following data frame
x y z
1 0 0 0
2 1 0 0
3 0 1 0
4 1 1 0
5 0 0 1
6 1 0 1
7 0 1 1
8 1 1 1
-------
x 4 2 1 <--- vector to multiply by
I would like to multiply each column by a seperate value, for example c(4,2,1). Giving:
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
Code:
pw2 <- c(4, 2, 1)
s01 <- seq_len(2) - 1
df <- expand.grid(x=s01, y=s01, z=s01)
df
for (d in seq_len(3)) df[,d] <- df[,d] * pw2[d]
df
Question: Find a vectorized solution without a for loop (in base R).
Note: that the question Multiply columns in a data frame by a vector is ambiguous because it includes:
Both queries can be easily solved with a for loop. Here a vectorised solution is explicitly requested.
Use sweep
to apply a function on margins of a dataframe:
sweep(df, 2, pw2, `*`)
or with col
:
df * pw2[col(df)]
output
x y z
1 0 0 0
2 4 0 0
3 0 2 0
4 4 2 0
5 0 0 1
6 4 0 1
7 0 2 1
8 4 2 1
For large data frames, check collapse::TRA
, which is 10x faster than any other answers (see benchmark):
collapse::TRA(df, pw2, "*")
Benchmark:
bench::mark(sweep = sweep(df, 2, pw2, `*`),
col = df * pw2[col(df)],
'%*%' = setNames(
as.data.frame(as.matrix(df) %*% diag(pw2)),
names(df)
),
TRA = collapse::TRA(df, pw2, "*"),
mapply = data.frame(mapply(FUN = `*`, df, pw2)),
apply = t(apply(df, 1, \(x) x*pw2)),
t = t(t(df)*pw2), check = FALSE,
)
# A tibble: 7 × 13
expression min median itr/s…¹ mem_al…² gc/se…³ n_itr n_gc total…⁴
<bch:expr> <bch:tm> <bch:t> <dbl> <bch:by> <dbl> <int> <dbl> <bch:t>
1 sweep 346.7µs 382.1µs 2427. 1.23KB 10.6 1141 5 470.2ms
2 col 303.1µs 330.4µs 2760. 784B 8.45 1307 4 473.5ms
3 %*% 72.8µs 77.9µs 11861. 480B 10.6 5599 5 472.1ms
4 TRA 5µs 5.5µs 167050. 0B 16.7 9999 1 59.9ms
5 mapply 117.6µs 127.9µs 7309. 480B 10.6 3442 5 470.9ms
6 apply 107.8µs 117.9µs 7887. 6.49KB 12.9 3658 6 463.8ms
7 t 55.3µs 59.7µs 15238. 720B 8.13 5620 3 368.8ms