Given the array of lists below, i want to be able to create a new list, giving the average and standard deviation of the columns
a = [ [1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6],
[1, 2],
[7, 2, 3, 4]]
Required result
mean = 2.8, 2.6, 3.75, 5
STDEV= 2.48997992, 0.894427191, 0.957427108, 1.414213562
I found the below example to give averages, which seems to work very well, but i wasnt clear how to adapt this for the standard deviation
import numpy as np
import numpy.ma as ma
from itertools import zip_longest
a = [ [1, 2, 3],
[2, 3, 4],
[3, 4, 5, 6],
[1, 2],
[7, 2, 3, 4]]
averages = [np.ma.average(ma.masked_values(temp_list, None)) for temp_list in zip_longest(*a)]
print(averages)
You can use these two lines:
>>> np.nanmean(np.array(list(zip_longest(*a)),dtype=float),axis=1)
array([2.8 , 2.6 , 3.75, 5. ])
>>> np.nanstd(np.array(list(zip_longest(*a)),dtype=float),axis=1,ddof=1)
array([2.48997992, 0.89442719, 0.95742711, 1.41421356])
nanmean
and nanstd
compute mean and std respectively, and ignoring nan
. So you are passing it the array:
>>> np.array(list(zip_longest(*a)),dtype=float)
array([[ 1., 2., 3., 1., 7.],
[ 2., 3., 4., 2., 2.],
[ 3., 4., 5., nan, 3.],
[nan, nan, 6., nan, 4.]])
And computing the mean and standard deviation for each row, ignoring NaN
s. The ddof
argument stands for degrees of freedom, and I set it to 1 based on your desired output (default is 0)