I have a dataset which contains data on the abundance of an organism and the sediment mud content % in which it was found.
I have subsequently partitioned the mud content data into 10 bins (i.e. 0 - 10%, 10.1 - 20% etc) and placed the abundance data into each bin accordingly.
The primary aim is to plot the maximum abundance in each mud bin over the mud gradient (i.e. 0 - 100 %) but for these maximums to be weighted by the number of samples in each bin.
So, my question is how to weight the maximum abundance in a given mud bin by the number of samples in each bin?
Here is an simple subset of my data:
Mud % bins: | 0 - 9 | 9.1 - 18 | 18.1 - 27 |
Abundance: 10,10,2,2,2,1,1 15,15,15,2 20,20,20,1,1,1,1,1
You can use ddply
from plyr package for that. In the following code,wtdabundance is your weighted abundance= (max of a bin*number of observation of that bin)/total observation
For your sample data,
mydata<-structure(list(id = 1:19, bin = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0-9",
"18.1-27", "9.1-18"), class = "factor"), abundance = c(10L, 10L,
2L, 2L, 2L, 1L, 1L, 15L, 15L, 15L, 2L, 20L, 20L, 20L, 1L, 1L,
1L, 1L, 1L)), .Names = c("id", "bin", "abundance"), class = "data.frame", row.names = c(NA,
-19L))
> mydata
id bin abundance
1 1 0-9 10
2 2 0-9 10
3 3 0-9 2
4 4 0-9 2
5 5 0-9 2
6 6 0-9 1
7 7 0-9 1
8 8 9.1-18 15
9 9 9.1-18 15
10 10 9.1-18 15
11 11 9.1-18 2
12 12 18.1-27 20
13 13 18.1-27 20
14 14 18.1-27 20
15 15 18.1-27 1
16 16 18.1-27 1
17 17 18.1-27 1
18 18 18.1-27 1
19 19 18.1-27 1
ddply(dat,.(bin), summarize, max.abundance=max(abundance), freq=length(bin),mwtdabundance=((max.abundance*freq/nrow(dat))))
bin max.abundance freq mwtdabundance
1 0-9 10 7 3.684211
2 18.1-27 20 8 8.421053
3 9.1-18 15 4 3.157895