Search code examples
rdataframeaggregatedcast

Combination of reshaping and aggregating a data.frame


I'm still somewhat new so please forgive any missteps, but I currently have a data.frame that looks like this.

Outcome1  Outcome2  Num_Occurances Name
False       False       2          John Doe
False       True        2          John Doe
True        False       4          John Doe
True        True        2          John Doe
False       True        1          Sally Doe
True        False       1          Sally Doe

I would like to both reshape and aggregate the data into a wider format, to eventually look like this:

  • successful_outcome2 is a sum of the True values in Outcome2
  • Total_Occurances is the sum of the Num_Occurances for each name
  • successful_outcome1 is where both Outcome1 and Outcome2 are True
  • Total_Occurances_Outcome1 is the summation of all True responses in the Outcome1 category
Name        successful_Outcome2  Total_Occurances  successful_Outcome1    Total_Occurances_Outcome1 
John Doe           4                  10                   2                        6
Sally Doe          1                   2                   0                        1

I understand that the dcast function can be used to melt and recast data into a wide format, but the combination of the different outcomes is throwing me for a loop. Any help would be appreciated!


Solution

  • A base R solution with aggregate + transform, i.e.,

    dfout <- aggregate(.~Name,
                       transform(df,
                                 successful_outcome2 = Outcome2*Num_Occurances,
                                 Total_Occurances = Num_Occurances,
                                 successful_Outcome1 = Outcome1*Outcome2*Num_Occurances,
                                 Total_Occurances_Outcome1 = Outcome1*Num_Occurances),
                       sum)[-(2:4)]
    

    yielding

    > dfout
           Name successful_outcome2 Total_Occurances successful_Outcome1 Total_Occurances_Outcome1
    1  John Doe                   4               10                   2                         6
    2 Sally Doe                   1                2                   0                         1
    

    DATA

    df <- structure(list(Outcome1 = c(FALSE, FALSE, TRUE, TRUE, FALSE, 
    TRUE), Outcome2 = c(FALSE, TRUE, FALSE, TRUE, TRUE, FALSE), Num_Occurances = c(2L, 
    2L, 4L, 2L, 1L, 1L), Name = c("John Doe", "John Doe", "John Doe", 
    "John Doe", "Sally Doe", "Sally Doe")), class = "data.frame", row.names = c(NA, 
    -6L))