Search code examples

Weighting maximum abundance by the number of samples

I have a dataset which contains data on the abundance of an organism and the sediment mud content % in which it was found.

I have subsequently partitioned the mud content data into 10 bins (i.e. 0 - 10%, 10.1 - 20% etc) and placed the abundance data into each bin accordingly.

The primary aim is to plot the maximum abundance in each mud bin over the mud gradient (i.e. 0 - 100 %) but for these maximums to be weighted by the number of samples in each bin.

So, my question is how to weight the maximum abundance in a given mud bin by the number of samples in each bin?

Here is an simple subset of my data:

Mud % bins: |     0 - 9      |     9.1 - 18      |     18.1 - 27    |
Abundance:   10,10,2,2,2,1,1      15,15,15,2      20,20,20,1,1,1,1,1


  • You can use ddply from plyr package for that. In the following code,wtdabundance is your weighted abundance= (max of a bin*number of observation of that bin)/total observation For your sample data,

    mydata<-structure(list(id = 1:19, bin = structure(c(1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("0-9", 
    "18.1-27", "9.1-18"), class = "factor"), abundance = c(10L, 10L, 
    2L, 2L, 2L, 1L, 1L, 15L, 15L, 15L, 2L, 20L, 20L, 20L, 1L, 1L, 
    1L, 1L, 1L)), .Names = c("id", "bin", "abundance"), class = "data.frame", row.names = c(NA, 
    > mydata
       id     bin abundance
    1   1     0-9        10
    2   2     0-9        10
    3   3     0-9         2
    4   4     0-9         2
    5   5     0-9         2
    6   6     0-9         1
    7   7     0-9         1
    8   8  9.1-18        15
    9   9  9.1-18        15
    10 10  9.1-18        15
    11 11  9.1-18         2
    12 12 18.1-27        20
    13 13 18.1-27        20
    14 14 18.1-27        20
    15 15 18.1-27         1
    16 16 18.1-27         1
    17 17 18.1-27         1
    18 18 18.1-27         1
    19 19 18.1-27         1
     ddply(dat,.(bin), summarize, max.abundance=max(abundance), freq=length(bin),mwtdabundance=((max.abundance*freq/nrow(dat))))
          bin max.abundance freq mwtdabundance
    1     0-9            10    7      3.684211
    2 18.1-27            20    8      8.421053
    3  9.1-18            15    4      3.157895