Search code examples
rdataframesyntaxr-factor

Extracting factor labels from table result for use as data frame column


I am doing clickstream logfile summaries with raw row input format UID=character, Win/Lose=Boolean. The output summary I want to create is of form row UID, sumWin, sumLose. I have used table to get part of what I want, but I am having trouble working out the right syntax for extracting the factor labels from a table result for use in the summary df. The example below builds up a tiny test case and shows where I get stuck: I cannot get at the factor labels from the table result. (Of course, you think there's a much better way of going about the whole thing - that would clearly be hugely useful too!)

I am still having trouble with formatting in the editor here - clearly that's a question I need to ask next...!

foo <- data.frame(Uid=character(4), Win=logical(4), stringsAsFactors=FALSE)  
  foo$Uid <- c("UidA", "UidB", "UidA", "UidC")  
  foo$Win <- c(FALSE, TRUE, TRUE, FALSE)  
  #display foo  
  foo  
   Uid   Win  
1 UidA FALSE  
2 UidB  TRUE  
3 UidA  TRUE  
4 UidC FALSE  

  # my desired summary df is, for each UID: NWin (foo$Win=TRUE), NRunUp (foo$Win=FALSE)   
  # here I initialise a holder for it  
  fooNUniques <- length(unique(foo$Uid))  
  fooSummary <- data.frame(Uids=character(fooNUniques),NWins=numeric(fooNUniques),NRunUps=numeric(fooNUniques))   
  fooSummary

  Uids NWins NRunUps

1          0       0  
2          0       0  
3          0       0  
  #I can reference in to the result of applying table to get part of what I want  
  #First I get the table, this gets me a table by win/lose value  
  fooTable <- table(foo$Uid, foo$Win)  
  fooTable  

         FALSE TRUE  
  UidA     1    1  
  UidB     0    1  
  UidC     1    0  

  # I can get at the actual results via unname which gives me a matrix  
  fooTableAsMat <- unname(fooTable)  
  fooTableAsMat  
     [,1] [,2]  
[1,]    1    1  
[2,]    0    1  
[3,]    1    0  

  #but the UID vec is hidden in the table structure *somewhere* and   
  # I can't work out how to reference it out  

  #coercing the result to a dataFrame doesn't work

  as.data.frame(fooTable)  
    Var1  Var2 Freq  
  1 UidA FALSE    1  
  2 UidB FALSE    0  
  3 UidC FALSE    1  
  4 UidA  TRUE    1  
  5 UidB  TRUE    1  
  6 UidC  TRUE    0  

  #I have also tried 'aggregate' but have not made friends with it

Solution

  • Does this help?

    Using plyr:

    > ddply(foo, .(Uid), summarise, NWin = sum(Win), NRunUp = sum(!Win))
    #    Uid NWin NRunUp
    # 1 UidA    1      1
    # 2 UidB    1      0
    # 3 UidC    0      1