I have a dataframe that looks like this:
example <- data.frame(
date = as.Date(c('2001-01-01',
'2001-01-02',
'2001-01-01',
'2001-01-02')),
PID_A = c(1091, 1091, 1037, 1037),
PID_B = c(2091, 2091, 2037, 2037),
resp_A = c(3,1,2,4),
resp_B = c(2,4,3,1),
xxx_connect_A = c(6,2,5,3),
xxx_connect_B = c(5,3,6,2)
)
I want to collapse across _As and _Bs to form one column for each variable.
Here is the desired output:
example_solution <- data.frame(
date = as.Date(rep(c('2001-01-01',
'2001-01-02'),4)),
PID = c(1091, 1091, 2091, 2091, 1037, 1037, 2037, 2037),
resp = c(3,1,2,4,2,4,3,1),
xxx_connect = c(6,2,5,3,5,3,6,2)
)
I tried this code from a previous stackoverflow answer but it doesn't work as well now because my variables have differing number of underscores.
example |>
pivot_longer(-date, names_to = c(".value", "who"), names_sep = "_")
Does anyone know an efficient way to fix this?
Thank you!
Instead of names_sep
, you can use names_pattern
with a regular expression to split the data
example |>
pivot_longer(-date, names_to = c(".value", "who"), names_pattern = "(.+)_([^_]+)$")
# date who PID resp xxx_connect
# <date> <chr> <dbl> <dbl> <dbl>
# 1 2001-01-01 A 1091 3 6
# 2 2001-01-01 B 2091 2 5
# 3 2001-01-02 A 1091 1 2
# 4 2001-01-02 B 2091 4 3
# 5 2001-01-01 A 1037 2 5
# 6 2001-01-01 B 2037 3 6
# 7 2001-01-02 A 1037 4 3
# 8 2001-01-02 B 2037 1 2
The regular expression looks for the last "_" that's at the end of the string and splits there.