Search code examples
rreshapemelt

find what variables changed per value of another variable


Given the following data.frame:

t   x   y
---------
1   1   3
1   1   3
1   1   2
2   1   2
2   2   2

I would like output of the form

t   cnt cux cuy
---------------
1   3   1   2
2   2   2   1

where cnt is the count of all rows with a particular value t, cux/cuy is the count of all unique rows of x/y

The other constraint is that the answer must work for a variable number of columns.

Thanks.


Solution

  • What you describe in words and what you show in expected output do not agree. In particular, counting unique values of y would be 2 and 1, not 3 and 2 based on your input. Going with the written description:

    DF <- data.frame(t=c(1,1,1,2,2), x=c(1,1,1,1,2), y=c(3,3,2,2,2))
    
    library("plyr")
    
    ddply(DF, .(t), function(DF) {
        data.frame(cnt=length(DF$t), colwise(function(x) {length(unique(x))})(DF))
    })
    

    Or if you want something really functional looking:

    library("functional")
    
    ddply(DF, .(t), function(DF) {
        data.frame(cnt=length(DF$t), colwise(Compose(unique, length))(DF))
    })
    

    Or going completely overboard with the functional paradigm:

    merge(ddply(DF, .(t), summarise, cnt=length(t)),
        ddply(DF, .(t), colwise(Compose(unique, length))))
    

    None of these give the column names you asked for; instead of cux it is x. However, they can be changed afterward.

    res <-
    merge(ddply(DF, .(t), summarise, cnt=length(t)),
        ddply(DF, .(t), colwise(Compose(unique, length))))
    
    names(res)[-(1:2)] <- paste("cu", names(DF)[-1], sep="")
    

    which gives

    > res
      t cnt cux cuy
    1 1   3   1   2
    2 2   2   2   1