Search code examples
rdplyrdata-cleaningstringr

Building separated columns for each category from columns that categories are mixed


I have a survey dataframe that contains evaluations of three categories. I do want to see evaluations of each category in each column, but it's mixed like this,

ID  q1 q2 q3 q4 q5 q6
1   c2 e1 c1 e1 c3 e2
2   c1 e1 c3 e2 c2 e0
3   c3 e0 c2 e2 c1 e1

What I want to get is:

ID  c1 c2 c3
1   e1 e1 e2
2   e1 e0 e2
3   e1 e2 e0

However, trying several ways via stringr and dplyr, now I'm just stuck here:

ID  a1    a2    a3
1   c1-e1 c2-e1 c3-e2
2   c1-e1 c2-e0 c3-e2
3   c1-e1 c2-e2 c3-e0

Is there anyone who knows how to fix this problem? Thank you in advance!


Solution

  • Looks like a stack to long, and then a reshape to wide format will accomplish this. There are about 3.2 million variations on this logic, but here's one:

    tmp <- reshape(dat, idvar="ID", varying=list(c(2,4,6),c(3,5,7)),
                   v.names=c("name","value"), direction="long", timevar=NULL)
    
    #    ID name value
    #1.1  1   c2    e1
    #2.1  2   c1    e1
    #3.1  3   c3    e0
    #1.2  1   c1    e1
    #2.2  2   c3    e2
    #3.2  3   c2    e2
    #1.3  1   c3    e2
    #2.3  2   c2    e0
    #3.3  3   c1    e1
    

    This long format may even be usable for modelling purposes. If you definitely require the wide format, reshape again:

    reshape(tmp, idvar="ID", timevar="name", direction="wide")
    
    #    ID value.c2 value.c1 value.c3
    #1.1  1       e1       e1       e2
    #2.1  2       e0       e1       e2
    #3.1  3       e2       e1       e0