Search code examples
riqr

Finding IQR of groups of rows


I am wanting to find the IQR of a range of values in a dataframe. These values are also grouped, therefore I need to find the IQR of each group in the dataframe. I have the following table:

 Block DNAname  Spot_Size   Molarity    Cy3_Fluorescence
 1  DNA 01  100pl   100 14266
 1  DNA 01  100pl   100 16020
 1  DNA 01  100pl   100 15705
 1  DNA 01  100pl   100 15783
 1  DNA 01  100pl   100 15834
 1  DNA 01  100pl   50  12248
 1  DNA 01  100pl   50  12209
 1  DNA 01  100pl   50  12511
 1  DNA 01  100pl   50  12316
 1  DNA 01  100pl   50  12469
 1  DNA 01  100pl   25  9626
 1  DNA 01  100pl   25  9804
 1  DNA 01  100pl   25  9794
 1  DNA 01  100pl   25  10020
 1  DNA 01  100pl   25  9739
 1  DNA 01  100pl   10  7158
 1  DNA 01  100pl   10  6802
 1  DNA 01  100pl   10  7378
 1   DNA 01 100pl   10  5949
 1  DNA 01  100pl   10  7484
 1  DNA 01  100pl   5   5257
 1  DNA 01  100pl   5   5560
 1  DNA 01  100pl   5   6076
 1  DNA 01  100pl   5   5925

I run the following code to find the IQR:

aggregate(Cy3.DNA1.100pl.1uM$Cy3_Fluorescence, list(Molarity=
    Cy3.DNA1.100pl.1uM$Molarity, Spot_Size=Cy3.DNA1.100pl.1uM$Spot_Size ), IQR)

This gives me the ouput:

   Molarity  Spot_Size   x
      5     100pl    384
     10     100pl    576
     25     100pl     65
     50     100pl    221
    100     100pl    129

This output groups all the molarities correctly but the IQR is not correct. If the code above has mean as a function instead of IQR the value for x (the function value) is correct as so:

   Molarity Spot_Size       x
    5     100pl       5752.4
   10     100pl       6954.2
   25     100pl       9796.6
   50     100pl      12350.6
  100     100pl      15521.6

The expected IQRS should be as follows:

Molarity IQR
100      324.25
50       258
25       363
10       519.5
5        400

Any help would be much appreciated. If any one has any ideas how I could perform this function for IQR, when there are groups of spot sizes (where spot size ranges from 100pl-400pl) including the molarity categories I would like to hear them.

Thank you kindly SO.


Solution

  • It's not clear if your problem is about aggregating, or about your (??) definition of IQR. There are many ways to calculate IQR (see this and this). As far as I can tell, none of them yield the results in your post.

    As far as aggregating based on spot size and molarity, here are two ways:

    # use aggregate(...) in base R - will be slow with large datasets
    aggregate(Cy3_Fluorescence~Molarity+Spot_Size,df,IQR)
    #   Molarity Spot_Size Cy3_Fluorescence
    # 1        5     100pl            478.5
    # 2       10     100pl            576.0
    # 3       25     100pl             65.0
    # 4       50     100pl            221.0
    # 5      100     100pl            129.0
    
    # use data.table - will be extremely fast.
    library(data.table)
    setDT(df)[,list(IQR=IQR(Cy3_Fluorescence)),by=list(Molarity,Spot_Size)]
    #    Molarity Spot_Size   IQR
    # 1:      100     100pl 129.0
    # 2:       50     100pl 221.0
    # 3:       25     100pl  65.0
    # 4:       10     100pl 576.0
    # 5:        5     100pl 478.5