Search code examples
rmergedplyrsummarize

Add variable with summarise but keep all other variables in R


I have a dataset with donations made to different politicians where each row is a specific donation.

donor.sector <- c(sector A, sector B, sector X, sector A, sector B)
total <- c(100, 100, 150, 125, 500)
year <- c(2006, 2006, 2007, 2007, 2007)
state <- c(CA, CA, CA, NY, WA)
target_specific <- c(politician A, politician A, politician A, politician B, politician C)
dat <- as.data.frame(donor.sector, total, year, target_specific, state)

I'm trying to get a year mean of donations for each politician. And I'm able to do so by doing the following:

library(dplyr)
  new.df <- dat%>%
  group_by(target_specific, year)%>%
  summarise(mean= mean(total))

My issue is that since I'm grouping this the outcome only has three variables: mean, year and target specific. Is there a way by which I can do this and create a new data frame where I keep the politician level variables, such as state?

Many thanks!


Solution

  • There are two ways in which you can do that :

    Include the additional variables in group_by :

    library(dplyr)
    
    dat%>%
       group_by(target_specific, year, state)%>%
       summarise(mean= mean(total))
    
    #  target_specific  year state  mean
    #  <chr>           <dbl> <chr> <dbl>
    #1 politician A     2006 CA      100
    #2 politician A     2007 CA      150
    #3 politician B     2007 NY      125
    #4 politician C     2007 WA      500
    

    Or keeping the same group_by structure you can include the first value of additional variable.

    dat%>%
      group_by(target_specific, year)%>%
      summarise(mean= mean(total), state = first(state))