Search code examples
rgroup-bypipedplyrsummary

dplyr: pipes inside of summarize after group_by


I have this data.frame:

df_test = structure(list(`MAE %` = c(-0.0647202646339709, -0.126867775585001, 
-1.81159420289855, -1.03092783505155, -2.0375491194877, -0.160783192796913, 
-0.585827216261999, -0.052988554472234, -0.703351261894911, -0.902996305924203, 
-0.767676767676768, -0.0101091791346543, -0.0134480903711673, 
-0.229357798165138, -0.176407935028625, -0.627062706270627, -1.75706139769261, 
-1.23024009524439, -0.257391763463569, -0.878347259688137, -0.123613523987705, 
-1.65711947626841, -2.11718534838887, -0.256285931980328, -1.87152777777778, 
-0.0552333609500138, -0.943983402489627, -0.541095890410959, 
-0.118607409474639, -0.840453845076341), Profit = c(7260, 2160, 
-7080, 3600, -8700, 6300, -540, 10680, -1880, -3560, -720, 5400, 
5280, 1800, 11040, -240, -2320, 2520, 10300, -2520, 8400, -9240, 
-5190, 7350, -6790, 3600, -3240, 8640, 7150, -2400)), .Names = c("MAE %", 
"Profit"), row.names = c(NA, 30L), class = "data.frame")

Now i want some summary statistics like:

df_test %>% 
    group_by(win.g = Profit > 0) %>%
    summarise(GroupCnt  = n(),
              TopMAE    = filter(`MAE %` > -1) %>% sum(Profit),
              BottomMAE = filter(`MAE %` <= -1) %>% sum(Profit))

So we group data if Profit > 0 or <= 0. Then i want sum() of Profit for rows with MAE % <= -1 and for MAE % > -1. Grouping must be used for TopMAE, BottomMAE calculation.

Expected result is like:

#  win.g CroupCnt TopMAE BottomMAE
#1 FALSE       14 -15100    -39320
#2  TRUE       16  95360      6120

But my R code does not working. I have an error:

Error: no applicable method for 'filter_' applied to an object of class "logical"

I have changed my code according to error:

df_test %>% 
    group_by(win.g = Profit > 0) %>%
    summarise(UnderStop = n(),
              TopMAE    = filter(., `MAE %` > -1) %>% sum(Profit),
              BottomMAE = filter(., `MAE %` <= -1) %>% sum(Profit))

But the result is none. I have an error again:

Error: incorrect length (14), expecting: 16

I tried to understand grouping behavior and how to use piping inside summarise after grouping, but i did not success. Spend whole day on it.

HOW can i get my expected result table? Please help me to understand dplyr logic when grouping and calculating some functions on that groups.


Solution

  • Is this what you are looking for? (Only asking because I get different results thatn your output),

    df_test %>% 
           group_by(win.g = Profit > 0) %>% 
           summarise(CroupCnt = n(), TopMAE = sum(Profit[`MAE %` > -1]), 
                                     BottomMAE = sum(Profit[`MAE %` <= -1]))
    
    #Source: local data frame [2 x 4]
    
    #  win.g CroupCnt TopMAE BottomMAE
    #  (lgl)    (int)  (dbl)     (dbl)
    #1 FALSE       14 -15100    -39320
    #2  TRUE       16  95360      6120