Search code examples
rdataframelogical-operatorssummary

R Summary Stat by Logical vector grouping


I have the following R data frames. I am trying to get summary stats by the logical vector grouping from the final "score" data frame.

    #original df
    type <- c("A", "B", "C","D","E")
    user <- c('user1','user2','user3','user4','user5')
    text <-c('this is a tweet','this is a fb post','tweeting is fun','other text','another fb post')
    tweet.mention <- c('TRUE','FALSE','TRUE','FALSE','FALSE')
    fb.mention <- c('FALSE','TRUE','FALSE','FALSE','TRUE')
    df1 <- cbind.data.frame(type, user, text,tweet.mention,fb.mention)
    df1

   #Remove records that are all FALSE
   tweet<-as.logical(tweet.mention)
   fb<-as.logical(fb.mention)
   test<-cbind(tweet,fb)
   true<-rowSums(test)
   all<-cbind(test,true)

   #Create score df
   score<-subset(df1,true>=1)

   #score API return
   sentiment<-c(1,.5,2,-2)

   #scored text
   score<-cbind(score,sentiment)

The score df removed record 4 as it should and contains the scored numeric value. Then I would like to get average sentiment score but grouped by tweet.mention(1.5) and fb.mention(-.75). I have tried summary from base R but that is all in. Thus I think a group by or subset is needed. I then tried the describeBy from the psych package. That isn't helping either.

Making matters more complicated is that I won't always know the number of logical vectors so can't subset them manually by specifying the column and having ==TRUE. I can create a list or vector of the column headers to lapply through but I am unsure the coding or function to get the grouping done.

I have read the base r and psych vignettes as well as checked the R Cookbook for this answer but cannot find it. I appreciate the help greatly.


Solution

  • 2 methods using base R:

    > with(score, tapply(sentiment, list(tweet.mention, fb.mention), mean))
          FALSE  TRUE
    FALSE    NA -0.75
    TRUE    1.5    NA
    

    and:

    > aggregate(sentiment~tweet.mention+fb.mention, data=score, mean)
      tweet.mention fb.mention sentiment
    1          TRUE      FALSE      1.50
    2         FALSE       TRUE     -0.75