EDIT: small df added.
I have a small dataset called benthic_data
of some benthic invertebrate indices (only one metric included below).
Site <- c('S-7','S-7','S-7','S-7','S-7','S-27','S-27','S-27','S-27','S-27')
Sample <- c('S-7-1','S-7-2','S-7-3','S-7-4','S-7-5','S-27-1','S-27-2','S-27-3','S-27-4','S-27-5')
Abundance <- c(310, 316, 361,317, 321,108, 173, 189, 229, 210)
benthic_data <- data.frame(Site, Sample, Abundance)
I have made the Sample data as a factor and would like to generate a figure that has one point for each sample, followed by a mean (with standard deviation whiskers) for each site.
benthic_data$Sample = factor(benthic_data$Sample, levels=c('S-7-1', 'S-7-2','S-7-3','S-7-4','S-7-5','S-27-1','S-27-2','S-27-3','S-27-4', 'S-27-5'))
A basic plot of the sites and their respective abundance value works fine (I will make the figure prettier later):
ggplot(benthic_data, aes(x=Sample, y=Abundance, fill=Site))+
geom_point(data = benthic_data, size = 4.0, colour="black", shape=21, show.legend = F)+
scale_fill_manual(values = c("darkgreen", "orangered3"))
To calculate the mean and SD for each site I have used the following code in order to try and factor each site and I also want the mean/sd point for each site to be labelled S-7 Mean and S-27 Mean, respectively.
benthic_summary<- as.data.frame(benthic_data) %>%
group_by(Site) %>%
summarize(mean=mean(Abundance, na.rm=T),
sd=sd(Abundance, na.rm=T))
benthic_summary$Site = revalue(benthic_summary$Site, c("S-7" = "S-7 Mean","S-27"="S-27 Mean"))
benthic_summary$Site <- factor(benthic_summary$Site, levels= c("S-7 Mean","S-27 Mean"))
Now, to combine the 5 points for each site PLUS the mean/sd for each site I used geom_pointrange with the following code but I added two more colours in the scale_fill_manual because I got this error message: Error: Insufficient values in manual scale. 4 needed but only 2 provided.
So, this code works fine EXCEPT, I need to have the S-7 samples first (it is an upstream site) followed by the S-27 samples and the legend isn't reflective of the proper site colour.
Site S-7 should be green and site S-27 should be orangered.
AEMP_cols=c("darkgreen", "orangered3")
ggplot(benthic_data, aes(x=Sample, y=Abundance, fill=Site))+
geom_point(data = benthic_data, size = 4.0, colour="black", shape=21, show.legend = F)+
scale_fill_manual(values = c("darkgreen","darkgreen", "orangered3", "orangered3"))+
geom_pointrange(data = benthic_summary, aes(x = Site, y=mean, ymin=mean-sd, ymax=mean+sd), colour = AEMP_cols, size =1, shape = 15)
So, I would like help to figure out how to ensure that the order of samples (points) on the x-axis are: S-7-1, S-7-2 .... S-7-5, S7 Mean then S-27-1, S-27-2 .... S-27-5, S-27 Mean. Similar to what the code above created but having the S-7 sites samples first followed by the S-27 samples.
I can easily recreate the code for the other indices so I am just starting with Abundance for now.
Any help would be appreciated. Thanks.
Are you looking for something like this ?
I started by generating a second dataframe with calculated mean of each site, that I add this as additional rows of the original dataset. I re-organized levels of factor Samples and Site. I finally passed it into ggplot using geom_point
and geom_errorbar
:
library(dplyr)
library(ggplot2)
Mean_DF <- benthic_data %>%
group_by(Site) %>%
summarise(Mean = mean(Abundance), SD = sd(Abundance)) %>%
mutate(Sample = c("S-27-Mean","S-7-Mean")) %>% rename(Abundance = Mean)
benthic_data %>% select(Site, Sample, Abundance) %>% bind_rows(., Mean_DF) %>%
mutate(Site = factor(Site, levels = c("S-7","S-27"))) %>%
mutate(Sample = factor(Sample, levels=c('S-7-1', 'S-7-2','S-7-3','S-7-4','S-7-5','S-7-Mean','S-27-1','S-27-2','S-27-3','S-27-4', 'S-27-5','S-27-Mean'))) %>%
ggplot(aes(x = Sample, y = Abundance, color = Site))+
geom_point()+
geom_errorbar(aes(ymin = Abundance-SD, ymax = Abundance+SD), width = 0.2)+
scale_color_manual(values = c("darkgreen", "orangered3"))