I'd have a data frame of authors in a much larger data set than the example in R that I'd like to get better descriptive's of. I know (kinda of) how to get the maxsum
but how could I get the max summary of unique authors EXCEPT for the top 2 most frequent authors for example? How would I then be able to determine the new maxsum
? How would I get the actual summary that the new maxsum
would be 3 instead of an output of it?
I'm basically looking for conditional way's of summarizing my data. Can anyone help me out in this department?
dat <- data.frame(author=c("a", "b", "c", "d", "a", "b", "c", "d", "e", "a", "a", "a","a", "a", "c","c","c","c"),Post=c("one", "one", "one", "one", "one", "one", "one", "one", "one", "one","one", "one","one", "one","one", "one","one", "one"))
authors <-dat[,1]
author_vec <- (authors)
length(unique(author_vec)) #5
ex_s <- summary(as.factor(neg.author_vec),maxsum=5)
Here is an approach using the plyr
library:
require(plyr)
temp <- ddply(dat, ~author, summarise, sum=length(author))
temp <- temp[order(-temp$sum), ][3:nrow(temp), ]
> temp
author sum
2 b 2
4 d 2
5 e 1
The authors a
and c
have been removed because they were two most frequently appearing authors in the data set.