Context: I conducted clam surveys at different sites and measured their sizes. The surveys did not include the same total area due to differences in low tides, extent of the clam bed, etc. Therefore, some sites may have high clam density (#/m^2) but low area, therefore the total count at is low, while others may have the opposite characteristics (or any other combination).
I am trying to create a faceted histogram to show size
frequencies at each different site
while removing the effect of the amount of area
surveyed at each site
. Essentially, I want frequencies that reflect each site's density (occurrences per unit area) so I can compare across sites and see overall differences in size distribution AND relative frequency.
Here are some example data:
site<-c(rep("D",5),rep("C",10),rep("B",10),rep("A",20))
size<-c(1,2,2,2,3,
1,1,2,2,2,2,2,2,3,3,
1,1,2,2,2,2,2,2,3,3,
1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3)
area<-c(rep(10,5),rep(20,10),rep(10,10),rep(20,20))
density<-c(rep(5/10,5),rep(10/20,10),rep(10/10,10),rep(20/20,20))
description<-c(rep("Low Density 0.5, Low Area 10",5),rep("Low Density 0.5, High Area 20",10),rep("High Density 1.0, Low Area 10",10),rep("High Density 1.0, High Area 20",20))
d<-data.frame(site,size,area,description)
I know I can graph the histogram with basic counts on the y-axis, which shows the effect of area and density:
ggplot(d, aes(x=size,fill=site))+
geom_histogram(aes(y=stat(count),group=site))+
facet_grid(site~.)
histogram of counts, influenced by area surveyed:
Or I can scale the y-axis to display relative frequencies so the total across all sites = 1, which also illustrates the influence of area surveyed and density:
ggplot(d, aes(x=size,fill=site))+
geom_histogram(aes(y=stat(count)/sum(count),group=site))+
facet_grid(site~.)
relative frequency across all sites (influenced by area surveyed):
Or I can scale the y-axis to display relative frequencies by site
, so the total within each site = 1, which removes the effects of density AND area (not what I want since this only lets me compare differences in size distribution, but not density):
ggplot(d, aes(x=size,fill=site))+
geom_histogram(aes(y=stat(density*width),group=site))+
facet_grid(site~.)
relative frequency in each site:
I really want to remove the effect of area
so that the graph displays differences in density. For this example, it should appear like the following graph
Note I had to manipulate the dataset to artificially create this graph as an example
Ideal Graph Example:
Can anyone help me figure out how to display differences in density across sites while removing the effect of total area surveyed?
Thank you in advance!
Does this do what you want?
library(tidyverse)
d %>%
count(site, size, area, description) %>%
mutate(density = parse_number(word(description, 3))) %>%
group_by(site) %>%
mutate(adj = n / sum(n) / 3 * density) %>%
ggplot(aes(size, adj, fill = description)) +
geom_col() +
facet_wrap(~site, ncol = 1)