Given the following data.frame:
t x y
---------
1 1 3
1 1 3
1 1 2
2 1 2
2 2 2
I would like output of the form
t cnt cux cuy
---------------
1 3 1 2
2 2 2 1
where cnt is the count of all rows with a particular value t, cux/cuy is the count of all unique rows of x/y
The other constraint is that the answer must work for a variable number of columns.
Thanks.
What you describe in words and what you show in expected output do not agree. In particular, counting unique values of y
would be 2 and 1, not 3 and 2 based on your input. Going with the written description:
DF <- data.frame(t=c(1,1,1,2,2), x=c(1,1,1,1,2), y=c(3,3,2,2,2))
library("plyr")
ddply(DF, .(t), function(DF) {
data.frame(cnt=length(DF$t), colwise(function(x) {length(unique(x))})(DF))
})
Or if you want something really functional looking:
library("functional")
ddply(DF, .(t), function(DF) {
data.frame(cnt=length(DF$t), colwise(Compose(unique, length))(DF))
})
Or going completely overboard with the functional paradigm:
merge(ddply(DF, .(t), summarise, cnt=length(t)),
ddply(DF, .(t), colwise(Compose(unique, length))))
None of these give the column names you asked for; instead of cux
it is x
. However, they can be changed afterward.
res <-
merge(ddply(DF, .(t), summarise, cnt=length(t)),
ddply(DF, .(t), colwise(Compose(unique, length))))
names(res)[-(1:2)] <- paste("cu", names(DF)[-1], sep="")
which gives
> res
t cnt cux cuy
1 1 3 1 2
2 2 2 2 1