I have many columns that have same names that always start with the same string, either n_ for the number of students, score_ for the percent of students who passed, and loc_ for the room number.
In this, I want to multiple the n_ columns with their respective score_ columns (so n_math * score_math, n_sci * score_sci, etc.) and create new columns called n_*_success for the number of students who passed the class.
If I had just a few columns like in this sample dataset, I would do something like this for each column:
mutate(n_sci_success = n_sci * score_sci)
But I have many columns and I'd like to write some expression that will match column names.
I think I have to use regex and across (like across(starts_with("n_)), but I just can't figure it out. Any help would be much appreciated!
Here's a sample dataset:
library(tidyverse)
test <- tibble(id = c(1:4),
n_sci = c(10, 20, 30, 40),
score_sci = c(1, .9, .75, .7),
loc_sci = c(1, 2, 3, 4),
n_math = c(100, 50, 40, 30),
score_math = c(.5, .6, .7, .8),
loc_math = c(4, 3, 2, 1),
n_hist = c(10, 50, 30, 20),
score_hist = c(.5, .5, .9, .9),
loc_hist = c(2, 1, 4, 3))
Here's one way using across
and new pick
function from dplyr
1.1.0
library(dplyr)
out <- test %>%
mutate(across(starts_with('n_'), .names = 'res_{col}') *
pick(starts_with('score_')) * pick(starts_with('loc_')))
out %>% select(starts_with('res'))
# res_n_sci res_n_math res_n_hist
# <dbl> <dbl> <dbl>
#1 10 200 10
#2 36 90 25
#3 67.5 56 108
#4 112 24 54
This should also work if you replace all pick
with across
. pick
is useful for selecting columns, across
is useful when you need to apply a function to the columns selected.
I am using across
in the 1st case (with starts_with('n_')
) is because I want to give unique names to the new columns using .names
which is not present in pick
.