Search code examples
rdplyrsummarize

Performing operations on dplyr summaries


Assume we have some random data:

data <- data.frame(ID = rep(seq(1:3),3),
                   Var = sample(1:9, 9))

we can compute summarizing operations using dplyr, like this:

library(dplyr)
data%>%
  group_by(ID)%>%
  summarize(count = n_distinct(Var))

which gives output that looks like this below an r markdown chunk:

ID count
1   3           
2   3           
3   3   

I would like to know how we can perform operations on individual data points in this dplyr output without saving the output in a separate object.

For example in the output of summarise, lets say we wanted to subtract the output value for ID == 3 from the sum of the output values for ID == 1 and ID == 2, and leave the output values for ID == 1 and ID == 2 like they are. The only way I know to do this is to save the summary output in another object and perform the operation on that object, like this:

a<-
  data%>%
  group_by(ID)%>%
  summarize(count = n_distinct(Var))
a
#now perform the operation on a
a[3,2] <- a[2,1]+a[2,2]-1
a

a now looks like this:

ID count
1   3           
2   3           
3   4

Is there a way to do this in dplyr output without making new objects? Can we somehow use mutate directly on output like this?


Solution

  • We can add a mutate after the summarise with replace to modify the location specified in list

    library(dplyr)
    data%>%
       group_by(ID)%>%
       summarize(count = n_distinct(Var)) %>% 
       mutate(count = replace(count, n(), count[2] + ID[2] - 1))
    

    -output

    # A tibble: 3 x 2
         ID count
      <int> <dbl>
    1     1     3
    2     2     3
    3     3     4
    

    Or if there are more than two columns, use sum on the sliced row

    data%>%
       group_by(ID)%>%
       summarize(count = n_distinct(Var)) %>% 
       mutate(count = replace(count, n(), sum(cur_data() %>% 
              slice(2)) - 1))