I am slowly working through a data transformation using R and the dplyr package. I started with unique rows per respondent. The data come from a conjoint experiment, so I have needed to head toward profiles (profile A or B in the experiment) nested in experimental iteration (each respondent took the experiment 5 times) nested in respondent ID.
I have successfully transformed the data to get experiments nested in respondent IDs. Now I have multiple columns X1-Xn that contain the attribute features. However, these effectively repeat attributes at this point, with, say, X1 including a variable for profile A in the experiment and X6 including the same variable but for profile B.
In the mocked image example below, I would basically need to merge columns v1a and v1b as just v1, v2a and v2b as just v2 and so forth, while generating a new column that delimits if they are from a or b.
Following up on the comments and helpful but not quite what is needed engagement with this original post, I have edited the post to include simple code for both the original data structure and the ideal outcome data:
#original dataframe
ID <- c(1, 1, 1, 2, 2, 2)
`Ex ID` <- c(1, 2, 3, 1, 2, 3)
v1a <- c(2, 4, 5, 1, 3, 5)
v2a = c(3, 4, 5, 2, 1, 5)
v3a = c(5, 4, 3, 3, 2, 1)
v1b = c(4, 5, 5, 1, 5, 4)
v2b = c(5, 2, 2, 4, 1, 4)
v3b = c(5, 5, 4, 5, 4, 5)
original <- data.frame(ID, 'Ex ID' , v1a, v2a, v3a, v1b, v2b,
v3b)
#wanted data frame
ID <- c(1, 1, 1, 1, 1, 1)
`Ex ID` <- c(1, 1, 2, 2, 3, 3)
profile <- c("a", "b", "a", "b", "a", "b")
v1ab = c(2, 4, 4, 5, 5, 5)
v2ab = c(3, 5, 4, 2, 5, 2)
v3ab = c(5, 5, 4, 5, 3, 4)
desired <- data.frame(ID, 'Ex ID', profile, v1ab, v2ab, v3ab)
I basically want to find a way to nest multiple variables within ID, experiment ID, profile IDs.
Any guidance would be greatly appreciated.
We could do this with base R using sapply
:
cols <- split(names(df)[-c(1,2)], substr(names(df)[-c(1,2)], start = 1, stop = 2))
cbind(df[c(1,2)], sapply(names(cols), function(col) {
do.call(paste, c(df[cols[[col]]], sep = ","))
}))
Output:
ID Ex_ID v1 v2
1 1 1 2,4 3,5
2 1 2 4,5 4,2
3 1 3 5,5 5,2
4 2 1 1,1 2,4
5 2 2 3,5 1,1
6 2 3 5,4 5,4
7 3 1 4,4 2,5
8 3 2 1,1 5,4
9 3 3 4,5 1,2
data:
df <- tibble(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3), Ex_ID = c(1,
2, 3, 1, 2, 3, 1, 2, 3), v1a = c(2, 4, 5, 1, 3, 5, 4, 1, 4),
v2a = c(3, 4, 5, 2, 1, 5, 2, 5, 1), v1b = c(4, 5, 5, 1, 5,
4, 4, 1, 5), v2b = c(5, 2, 2, 4, 1, 4, 5, 4, 2))