Search code examples
rdplyrcountaveragesummarize

Calculate counts and averages by group for various columns in a dataset


I am trying to summarize a dataset.

I am looking to produce a table with counts and averages all in one.

Example data:

df <- data.frame(
    "Species" = c("A","B","C","D","A","B","C","D"), 
    "Location" =  c("A","B","C","B","A","D","D","E"), 
    "Sample size" = c(1,30,6,2,5,10,3,300), 
    "Frequency"=c(0,0.3,80,0.5,0.01,0.6,1,2)
  )

df

Data produces a table like this:

     Species Country Sample.size Frequency
1       A       A           1         0
2       B       B          30       0.3
3       C       C           6        80
4       D       B           2       0.5
5       A       A           5      0.01
6       B       D          10       0.6
7       C       D           3         1
8       D       E         300         2

I am trying to make a table with a column for: species, a count for the number of times a species occurs, a count for the number of countries a species occurs in, an average for sample size per species, and an average frequency per species.

Essentially, I am trying to get a table like this:

Species species_count #_of_Countries Avg_Sample.size Avg_Frequency
A       2             2              10              0
B       2             3              3               0.01
C       3             4              1               20
D       5             1              5               0.5

I am relatively new to R, so any help would be appreciated!


Solution

  • I think this is what you want library(dplyr)

    Summary_df <- df %>% 
    group_by(species) %>%
    summarize(species_count = n(), 
                     country_count = sum(!is.na(Country)), 
                 Avg_sample_size = mean(Sample.size), 
                 Avg_frequency = mean(Frequency))