I need help coding a function in python for calculating average and standard deviation from N-1 samples.
I have 96 rows of quarduplicate samples: total of 384 samples in 96x4 numpy arrays.
For each row, I would like to:
Take out one sample in quadruplicates so they become triplicates
[30,38,23,21] becomes [38,23,21]
Calculate mean and standard deviation of those triplicate samples
mean = 27.33, stdev = 9.29
Put back that sample so they are quadruplicates again
[38,23,21] becomes [30,38,23,21]
Repeat Step 1-3 three more times taking out other sample each time
[30,23,21]: mean = 24.67, stdev = 4.73
[30,38,21]: mean = 29.67, stdev = 8.50
[30,38,23]: mean = 30.33, stdev = 7.51
Find the average with the lowest standard deviation among those calculated data
[30,23,21]: mean = 24.67, stdev = 4.73
Move on to next row and repeat Step 1-4
Output is a 96x1 array with found average for each corresponding row
Basically I want to calculate mean and standard deviation under the assumption of one of quadruplicates is an outlier.
I tried coding a function with nested for-loops but it became too long and ugly. I need an advice for smarter way.
I came up with the following:
import numpy as np
def bestMean(rows):
bestMeans = []
for row in rows:
mean = [np.mean(row[:k] + row[k+1:]) for k in xrange(len(row))]
std = [np.std(row[:k] + row[k+1:]) for k in xrange(len(row))]
bestMeans.append((mean[np.argmin(std)], np.min(std)))
return bestMeans
I did a quick test and it seemed to work. Note though, that this isn't the fastest option out there but quite readable.