I am trying to summarize a dataset.
I am looking to produce a table with counts and averages all in one.
Example data:
df <- data.frame(
"Species" = c("A","B","C","D","A","B","C","D"),
"Location" = c("A","B","C","B","A","D","D","E"),
"Sample size" = c(1,30,6,2,5,10,3,300),
"Frequency"=c(0,0.3,80,0.5,0.01,0.6,1,2)
)
df
Data produces a table like this:
Species Country Sample.size Frequency
1 A A 1 0
2 B B 30 0.3
3 C C 6 80
4 D B 2 0.5
5 A A 5 0.01
6 B D 10 0.6
7 C D 3 1
8 D E 300 2
I am trying to make a table with a column for: species, a count for the number of times a species occurs, a count for the number of countries a species occurs in, an average for sample size per species, and an average frequency per species.
Essentially, I am trying to get a table like this:
Species species_count #_of_Countries Avg_Sample.size Avg_Frequency
A 2 2 10 0
B 2 3 3 0.01
C 3 4 1 20
D 5 1 5 0.5
I am relatively new to R, so any help would be appreciated!
I think this is what you want library(dplyr)
Summary_df <- df %>%
group_by(species) %>%
summarize(species_count = n(),
country_count = sum(!is.na(Country)),
Avg_sample_size = mean(Sample.size),
Avg_frequency = mean(Frequency))