Sum is not computed over groups (always gives the absolute total)

I'm creating some summary tables and I'm having a hard time with simple sums...

While the count of records is correct, the variables with sums always compute the same value for all groups.

This is the code:

SummarybyCallContext <- PSTNRecords %>% 
                            group_by (PSTNRecords$destinationContext) %>% 
                                summarize(
                                  Calls = n(), 
                                  Minutes = sum(PSTNRecords$durationMinutes),
                                  Charges = sum(PSTNRecords$charge), 
                                  Fees = sum(PSTNRecords$connectionCharge)
                                )
                                  
SummarybyCallContext

And this is the result:

Minutes and Charges should be different for each group (Fees is always zero, but I need to display it anyway in the table).

Setting na.rm to TRUE or FALSE doesn't seem to change the result.

What am I doing wrong?

Thanks in advance!

~Alienvolm

Solution

(Almost) Never use PSTNRecords$ within dplyr verb functions in a pipeline starting from PSTNRecords. Why? With the $-indexing, every reference is to the original data, before any grouping or filtering or adding/changing columns or rearranging is done. Without the $-referencing, it is using the columns as they appear at that point in the pipeline.

SummarybyCallContext <- PSTNRecords %>% 
                            group_by (destinationContext) %>% 
                                summarize(
                                  Calls = n(), 
                                  Minutes = sum(durationMinutes),
                                  Charges = sum(charge), 
                                  Fees = sum(connectionCharge)
                                )

There are exceptions to this, but they are rare and, for the vast majority of new dplyr users, generally done better via other mechanisms.

Demonstration:

dat <- data.frame(x=1:5)
dat %>%
  filter(dat$x > 2) %>%      # this still works okay, since `dat` and "data now" are same
  summarize(x2 = dat$x[1])   # however, `dat` has 5 rows but data in pipe only has 3 rows
#   x2
# 1  1
dat %>%
  filter(x > 2) %>%
  summarize(x2 = x[1])
#   x2
# 1  3