find what variables changed per value of another variable

Given the following data.frame:

I would like output of the form

t   cnt cux cuy
---------------
1   3   1   2
2   2   2   1

where cnt is the count of all rows with a particular value t, cux/cuy is the count of all unique rows of x/y

The other constraint is that the answer must work for a variable number of columns.

Thanks.

Solution

What you describe in words and what you show in expected output do not agree. In particular, counting unique values of y would be 2 and 1, not 3 and 2 based on your input. Going with the written description:

DF <- data.frame(t=c(1,1,1,2,2), x=c(1,1,1,1,2), y=c(3,3,2,2,2))

library("plyr")

ddply(DF, .(t), function(DF) {
    data.frame(cnt=length(DF$t), colwise(function(x) {length(unique(x))})(DF))
})

Or if you want something really functional looking:

library("functional")

ddply(DF, .(t), function(DF) {
    data.frame(cnt=length(DF$t), colwise(Compose(unique, length))(DF))
})

Or going completely overboard with the functional paradigm:

merge(ddply(DF, .(t), summarise, cnt=length(t)),
    ddply(DF, .(t), colwise(Compose(unique, length))))

None of these give the column names you asked for; instead of cux it is x. However, they can be changed afterward.

res <-
merge(ddply(DF, .(t), summarise, cnt=length(t)),
    ddply(DF, .(t), colwise(Compose(unique, length))))

names(res)[-(1:2)] <- paste("cu", names(DF)[-1], sep="")

which gives

> res
  t cnt cux cuy
1 1   3   1   2
2 2   2   2   1