I had hoped to use ddply's mode function to find the most common string for a certain user by time period.
This relates significantly to this question and this question.
Using a data set similar to this:
Data <- data.frame(
groupname = as.factor(sample(c("red", "green", "blue"), 100, replace = TRUE)),
timeblock = sample(1:10, 100, replace = TRUE),
someuser = sample(c("bob", "sally", "sue"), 100, replace = TRUE))
I'd tried:
groupnameagg<- ddply(Data, .(timeblock, groupname, someuser), summarise, groupmode = mode(groupname))
But that's not doing what I had expected. It returns:
> head(groupnameagg$groupname)
[1] "numeric" "numeric" "numeric" "numeric" "numeric" "numeric"
timeblock username mostcommongroupforuser
1 bob red
1 sally red
1 sue green
2 bob green
2 sally blue
2 sue red
Think aggregate should do the trick for both
PART 1
aggregate(Data$groupname,by=list(Data$timeblock,Data$someuser),
function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))] })
PART 2
aggregate(Data$groupname,by=list(Data$timeblock,Data$someuser),
function(x) {
levels(Data$groupname)[max(as.numeric(x))] })