Search code examples
rdplyrsummary

How do I add summary-statistics values to each row of a data frame?


I would like to know a tidyverse way to add summary statistics back to each row of a dataframe.

The code below works, but should be a quicker way out there, right?

library("tidyverse")
data <- (iris)

means <- iris %>%
  group_by(Species) %>%
  summarise(
    Sepal.Length = mean(Sepal.Length),
    Sepal.Width = mean(Sepal.Width)
  )

data <- merge(data, means, by = "Species")

Solution

  • One way to do this would be to use mutate.

    library("tidyverse")
    data <- (iris)
    
    data<-data %>% 
      group_by(Species) %>% 
      mutate(Sepal.Length.y=mean(Sepal.Length), Sepal.Width.y=mean(Sepal.Width)) 
    
    
    
    

    So this is very similar to what you had before but cuts out a few steps. If you want to rearrange the order of the columns you can reorder them. Also, I would recommend changing the column names from Sepal.Length and Sepal.Width in your post but if you don't specify a unique name r will just put a .y on them to make them unique. Hope this helps.