We would like to know how subtract columns, two by two. Specifically, we want to subtract the columns of the dataframe as follow:
u_2018
- u_2019
.u_2019
- u_2020
.u_2020
- u_2021
.u_2021
- u_2022
.We looked for a optimal solution in the several stackoverflow question posts, like this (How to subtract two columns using tidyverse mutate with columns named by external variables), but the only undesired approach achieved is datiled in the R code used section. The R session version is 4.2.0
and the dplyr
package version is 1.0.9
.
> dat
# A tibble: 6 × 5
u_2018 u_2019 u_2020 u_2021 u_2022
<int> <int> <int> <int> <int>
1 90035 88015 76135 50725 16517
2 20 NA NA 13792 12793
3 555 620 15032 19309 6479
4 11171 11782 10281 8974 3901
5 NA 116896 40169 13191 3610
dat %>%
mutate(
diff_2018_2019 = u_2018 - u_2019,
diff_2019_2020 = u_2019 - u_2020,
diff_2020_2021 = u_2020 - u_2021,
diff_2021_2022 = u_2021 - u_2022)
We would like to know the optimal solution to subtract two by two columns. Maybe one approach includes mutate_at()
or across
to obtain the differences between and save the subtraction in new columns.
Thanks in advance
You could use pmap_dfr
+ diff
:
library(purrr)
pmap_dfr(df, ~ -diff(c(...))) %>%
set_names(~ as.integer(sub('u_', '', .x)) %>% paste('dif', . - 1, ., sep = '_'))
# # A tibble: 5 × 4
# dif_2018_2019 dif_2019_2020 dif_2020_2021 dif_2021_2022
# <int> <int> <int> <int>
# 1 2020 11880 25410 34208
# 2 NA NA NA 999
# 3 -65 -14412 -4277 12830
# 4 -611 1501 1307 5073
# 5 NA 76727 26978 9581
Note that diff()
defaults to subtract the former from the latter, so you need to invert the sign with -diff()
.
df <- structure(list(u_2018 = c(90035L, 20L, 555L, 11171L, NA), u_2019 = c(88015L,
NA, 620L, 11782L, 116896L), u_2020 = c(76135L, NA, 15032L, 10281L, 40169L),
u_2021 = c(50725L, 13792L, 19309L, 8974L, 13191L), u_2022 = c(16517L,
12793L, 6479L, 3901L, 3610L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))