I have two dataframes in R. The first dataframe contains several columns-features, as well as a column that tells whether a particular sample (row) belongs to a certain group (a factor variable). The second dataframe contains the same number of columns, and the number of rows equals the number of unique groups. I want to subtract from each sample of the first dataframe the corresponding vector from the second dataframe, where the correspondence is specified using the key-group in the column of the same name.
Here is an example of the main dataset:
df_repr <- structure(list(f1 = c(-3.9956064225704,
-0.52380279948658, 0.61089389331505, -3.47273625634875, -4.486918671214,
-6.1761970731672, -4.62305749757367, -4.42540643005429, -3.61613137597131,
-3.29821425516253), f2 = c(-1.57918114753228,
-4.10523012500727, -1.80270009366593, -0.00905317702835884, -0.899585192079915,
-2.89341515186212, 0.0132542126386332, -3.32639898550135, -0.867793877742314,
0.0911950321630834), f3 = c(-6.02532301769732,
-4.90073348094302, -3.73159604513274, -3.55290209472808, -6.63194560195811,
2.69409789701296, -4.17675978927128, -3.84141885970095, -1.20571283849034,
1.54287440902102), group = structure(c(1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -10L))
Here is an example dataframe with vectors to be subtracted from each row of the corresponding group of the first dataframe:
to_subtract <- structure(list(group = structure(1:2, .Label = c("A",
"B"), class = "factor"), f1 = c(-2.78048744402161,
-2.33583431665818), f2 = c(-2.56086962108741,
-0.689157827347865), f3 = c(-3.60224982918457,
-0.782365376308658)), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"))
# # A tibble: 2 × 4
# group f1 f2 f3
# <fct> <dbl> <dbl> <dbl>
# 1 A -2.78 -2.56 -3.60
# 2 B -2.34 -0.689 -0.782
I tried to do it like this:
df_repr %>%
group_by(group) %>%
mutate(across(where(is.numeric),
~ . - to_subtract[to_subtract$group == unique(.$group), -1]))
But I get the following error:
Error in `mutate()`:
ℹ️ In argument: `across(...)`.
ℹ️ In group 1: `group = A`.
Caused by error in `across()`:
! Can't compute column `f1`.
Caused by error in `f1$group`:
! $ operator is invalid for atomic vectors
Expected output for this example:
f1 f2 f3 group
<dbl> <dbl> <dbl> <fct>
1 -1.22 0.982 -2.42 A
2 2.26 -1.54 -1.30 A
3 3.39 0.758 -0.129 A
4 -0.692 2.55 0.0493 A
5 -1.71 1.66 -3.03 A
6 -3.84 -2.20 3.48 B
7 -2.29 0.702 -3.39 B
8 -2.09 -2.64 -3.06 B
9 -1.28 -0.179 -0.423 B
10 -0.962 0.780 2.33 B
You can use powerjoin
with (conflict = `-`)
:
library(powerjoin)
power_left_join(df_repr, to_subtract, by = "group", conflict = `-`)
# A tibble: 10 × 4
group f1 f2 f3
<fct> <dbl> <dbl> <dbl>
1 A -1.22 0.982 -2.42
2 A 2.26 -1.54 -1.30
3 A 3.39 0.758 -0.129
4 A -0.692 2.55 0.0493
5 A -1.71 1.66 -3.03
6 B -3.84 -2.20 3.48
7 B -2.29 0.702 -3.39
8 B -2.09 -2.64 -3.06
9 B -1.28 -0.179 -0.423
10 B -0.962 0.780 2.33
Another dplyr::group_modify
approach:
df_repr %>%
group_by(group) %>%
group_modify(~ mutate(.x, across(f1:f3, \(val) {
val - filter(to_subtract, group == .y$group)[[cur_column()]]
}))) %>%
ungroup()