I need to pivot a tidy dataset into a wider format based on two set of keys using dplyr. I am not great with the terminology of pivoting so please excuse if 'keys' was not the right term. Here is some toy data to illustrate. These are data from two fictional participants: three different measures taken once on each of four days. For each measure we have a total score summed across the four days, expressed in the 'tot' column. This value will be constant over the four days for each measure.
library(dplyr)
df <- data.frame(id = rep(c("DFE3",
"DFE76"),
each = 12),
measure = rep(letters[1:3],
each = 4,
length.out = 24),
day = rep(1:4,
times = 3,
length.out = 24),
score = sample(0:5,
24,
replace = T)) %>%
arrange(id,measure,day) %>%
group_by(id, measure) %>%
mutate(tot = sum(score)) %>%
ungroup
df
# # A tibble: 24 x 5
# id measure day score tot
# <fct> <fct> <int> <int> <int>
# 1 DFE3 a 1 5 12
# 2 DFE3 a 2 2 12
# 3 DFE3 a 3 5 12
# 4 DFE3 a 4 0 12
# 5 DFE3 b 1 1 9
# 6 DFE3 b 2 2 9
# 7 DFE3 b 3 5 9
# 8 DFE3 b 4 1 9
# 9 DFE3 c 1 0 15
# 10 DFE3 c 2 5 15
# # i 14 more rows
# # i Use `print(n = ...)` to see more rows
Now what I want to do is pivot so that I get one column for each measure
and week
for the score
column AND one column for each measure only for the tot
column.
When I ran this code...
df %>%
pivot_wider(names_from = c(measure,
day),
values_from = c(score, tot))
# A tibble: 2 x 25
# id score_a_1 score_a_2 score_a_3 score_a_4 score_b_1 score_b_2 score_b_3 score_b_4 score_c_1 score_c_2 score_c_3 score_c_4 tot_a_1
# <fct> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 DFE3 2 1 3 3 4 4 5 0 2 0 3 5 9
# 2 DFE76 1 4 4 2 1 2 2 4 2 3 2 5 11
# # i 11 more variables: tot_a_2 <int>, tot_a_3 <int>, tot_a_4 <int>, tot_b_1 <int>, tot_b_2 <int>, tot_b_3 <int>, tot_b_4 <int>,
# # tot_c_1 <int>, tot_c_2 <int>, tot_c_3 <int>, tot_c_4 <int>
...it gave me what I wanted for the score variable - spread across measure
and week
- but it did the same thing for the tot
column, which is not what I want (there should only be three columns per id, one for each measure.
Is there any way to do these processes simultaneously using pivot_wider
?
This might be simplest to think of as two tables keyed to id
, one pivoted wide reflecting measure and day, the other pivoted just using measure, and then joined together based on id
:
library(dplyr); library(tidyr)
df %>%
select(-tot) %>%
pivot_wider(names_from = c(measure, day), values_from = score,
names_prefix = "score_") %>%
left_join(df %>%
distinct(id, measure, tot) %>%
pivot_wider(names_from = measure, values_from = tot,
names_prefix = "tot_"))
Result
Joining with `by = join_by(id)`
# A tibble: 2 × 16
id score_a_1 score_a_2 score_a_3 score_a_4 score_b_1 score_b_2 score_b_3 score_b_4 score_c_1 score_c_2 score_c_3 score_c_4 tot_a tot_b tot_c
<chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 DFE3 0 4 0 4 4 0 3 2 5 2 5 1 8 9 13
2 DFE76 4 5 5 5 2 2 3 5 4 4 2 2 19 12 12