Search code examples
rvectordataframesummary

Conditional Summary in R: MaxSum


I'd have a data frame of authors in a much larger data set than the example in R that I'd like to get better descriptive's of. I know (kinda of) how to get the maxsum but how could I get the max summary of unique authors EXCEPT for the top 2 most frequent authors for example? How would I then be able to determine the new maxsum? How would I get the actual summary that the new maxsum would be 3 instead of an output of it?

I'm basically looking for conditional way's of summarizing my data. Can anyone help me out in this department?

dat <- data.frame(author=c("a", "b", "c", "d", "a", "b", "c", "d", "e", "a", "a", "a","a", "a", "c","c","c","c"),Post=c("one", "one", "one", "one", "one", "one", "one", "one", "one", "one","one", "one","one", "one","one", "one","one", "one"))
authors <-dat[,1]
author_vec <- (authors)
length(unique(author_vec)) #5
ex_s <- summary(as.factor(neg.author_vec),maxsum=5)

Solution

  • Here is an approach using the plyr library:

    require(plyr)
    temp <- ddply(dat, ~author, summarise, sum=length(author))
    temp <- temp[order(-temp$sum), ][3:nrow(temp), ]
    
    > temp
      author sum
    2      b   2
    4      d   2
    5      e   1
    

    The authors a and c have been removed because they were two most frequently appearing authors in the data set.