Search code examples
rdataframedplyrpivottransformation

Data transformation to experimental profiles, nested experiments, nested in respondents in R


I am slowly working through a data transformation using R and the dplyr package. I started with unique rows per respondent. The data come from a conjoint experiment, so I have needed to head toward profiles (profile A or B in the experiment) nested in experimental iteration (each respondent took the experiment 5 times) nested in respondent ID.

I have successfully transformed the data to get experiments nested in respondent IDs. Now I have multiple columns X1-Xn that contain the attribute features. However, these effectively repeat attributes at this point, with, say, X1 including a variable for profile A in the experiment and X6 including the same variable but for profile B.

In the mocked image example below, I would basically need to merge columns v1a and v1b as just v1, v2a and v2b as just v2 and so forth, while generating a new column that delimits if they are from a or b.

Following up on the comments and helpful but not quite what is needed engagement with this original post, I have edited the post to include simple code for both the original data structure and the ideal outcome data:

#original dataframe    
ID <- c(1, 1, 1, 2, 2, 2) 
`Ex ID` <- c(1, 2, 3, 1, 2, 3) 
v1a <- c(2, 4, 5, 1, 3, 5) 
v2a = c(3, 4, 5, 2, 1, 5) 
v3a = c(5, 4, 3, 3, 2, 1) 
v1b = c(4, 5, 5, 1, 5, 4) 
v2b = c(5, 2, 2, 4, 1, 4) 
v3b = c(5, 5, 4, 5, 4, 5)
original <- data.frame(ID, 'Ex ID' , v1a, v2a, v3a, v1b, v2b, 
v3b)

#wanted data frame
ID <- c(1, 1, 1, 1, 1, 1)
`Ex ID` <- c(1, 1, 2, 2, 3, 3)
profile <- c("a", "b", "a", "b", "a", "b")
v1ab = c(2, 4, 4, 5, 5, 5)
v2ab = c(3, 5, 4, 2, 5, 2)
v3ab = c(5, 5, 4, 5, 3, 4)
desired <- data.frame(ID, 'Ex ID', profile, v1ab, v2ab, v3ab)

I basically want to find a way to nest multiple variables within ID, experiment ID, profile IDs.

Any guidance would be greatly appreciated.


Solution

  • We could do this with base R using sapply:

    cols <- split(names(df)[-c(1,2)], substr(names(df)[-c(1,2)], start = 1, stop = 2))
    
    cbind(df[c(1,2)], sapply(names(cols), function(col) {
        do.call(paste, c(df[cols[[col]]], sep = ","))
    }))
    

    Output:

      ID Ex_ID  v1  v2
    1  1     1 2,4 3,5
    2  1     2 4,5 4,2
    3  1     3 5,5 5,2
    4  2     1 1,1 2,4
    5  2     2 3,5 1,1
    6  2     3 5,4 5,4
    7  3     1 4,4 2,5
    8  3     2 1,1 5,4
    9  3     3 4,5 1,2
    

    data:

    df <- tibble(ID = c(1, 1, 1, 2, 2, 2, 3, 3, 3), Ex_ID = c(1, 
    2, 3, 1, 2, 3, 1, 2, 3), v1a = c(2, 4, 5, 1, 3, 5, 4, 1, 4), 
    v2a = c(3, 4, 5, 2, 1, 5, 2, 5, 1), v1b = c(4, 5, 5, 1, 5, 
    4, 4, 1, 5), v2b = c(5, 2, 2, 4, 1, 4, 5, 4, 2))