Search code examples
rdataframereshaperpart

Reshape binomial data to long bernoulli format


I am coming back to R after a year and want to use rpart for a classification tree.

My data looks like:

Category, Shape, Color, Yes, No
A, Square, Blue, 3, 2
B, Triangle, Blue, 2, 4
etc. 

Any recommendations to reshape into the below so I can use rpart? (I believe rpart needs the data as such)

ID, Shape, Color, Result
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, Yes
A, Square, Blue, No
A, Square, Blue, No
B, Triangle, Green, Yes
etc...

Thank you!


Solution

  • You can using melt from reshape2 , then follow by rep

    s=melt(df,id.var=c('Category','Shape','Color'))
    s[ rep( 1:nrow(s) , s$value ),]
                  Category     Shape Color variable value
    1                    A    Square  Blue      Yes     3
    1.1                  A    Square  Blue      Yes     3
    1.2                  A    Square  Blue      Yes     3
    2                    B  Triangle  Blue      Yes     2
    2.1                  B  Triangle  Blue      Yes     2
    3                    A    Square  Blue       No     2
    3.1                  A    Square  Blue       No     2
    4                    B  Triangle  Blue       No     4
    4.1                  B  Triangle  Blue       No     4
    4.2                  B  Triangle  Blue       No     4
    4.3                  B  Triangle  Blue       No     4