Let's say I have a dataframe with 3 ID columns and one column of interest. Each row represents one observation. Some ID have multiple observations, i.e., multiple rows.
df <- data.frame(id1 = c( 1, 2, 3, 4, 4),
id2 = c( 11, 12, 13, 14, 14),
id3 = c(111, 112, 113, 114, 114),
variable_of_interest = c(13, 24, 35, 31, 12))
id1 id2 id3 variable_of_interest
1 1 11 111 13
2 2 12 112 24
3 3 13 113 35
4 4 14 114 31
5 4 14 114 12
My goal is to restructure it in odred to have one row per ID, to keep the 3 IDs and to name the new columns "variable_of_interest1", "variable_of_interest2":
id1 id2 id3 variable_of_interest1 variable_of_interest1
1 1 11 111 13 NA
2 2 12 112 24 NA
3 3 13 113 35 NA
4 4 14 114 31 12
The solution might need reshape2
and the dcast
function, but until now, I could not solve this out.
We can create a sequence grouped by the 'id' columns and then with pivot_wider
reshape to wide
library(dplyr)
library(stringr)
library(tidyr)
library(data.table)
df %>%
mutate(ind = str_c('variable_of_interest', rowid(id1, id2, id3))) %>%
pivot_wider(names_from = ind, values_from = variable_of_interest)
-output
# A tibble: 4 x 5
# id1 id2 id3 variable_of_interest1 variable_of_interest2
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 11 111 13 NA
#2 2 12 112 24 NA
#3 3 13 113 35 NA
#4 4 14 114 31 12
Or another option is data.table
library(data.table)
dcast(setDT(df), id1 + id2 + id3 ~
paste0('variable_of_interest', rowid(id1, id2, id3)),
value.var = 'variable_of_interest')
-output
# id1 id2 id3 variable_of_interest1 variable_of_interest2
#1: 1 11 111 13 NA
#2: 2 12 112 24 NA
#3: 3 13 113 35 NA
#4: 4 14 114 31 12