Search code examples
pythonlistmeanstandard-deviationsublist

Calculating mean and standard deviation and ignoring 0 values


I have a list of lists with sublists all of which contain float values. For example the one below has 2 lists with sublists each:

 mylist =  [[[2.67, 2.67, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]], [[2.67, 2.67, 2.0, 2.0], [0.0, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [0.0, 0.0, 0.0, 0.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0], [2.67, 2.67, 2.0, 2.0]]]

I want to calculate the standard deviation and the mean of the sublists and what I applied was this:

mean = [statistics.mean(d) for d in mylist]
stdev = [statistics.stdev(d) for d in mylist]

but it takes also the 0.0 values that I do not want because I turned them to 0 in order not to be empty ones. Is there a way to ignore these 0s as they do not exist in the sublist?To not take them under consideration at all? I could not find a way for how I am doing it.


Solution

  • You can use numpy's nanmean and nanstd functions.

    import numpy as np
    
    
    def zero_to_nan(d):
        array = np.array(d)
        array[array == 0] = np.NaN
        return array
    
    
    mean = [np.nanmean(zero_to_nan(d)) for d in mylist]
    stdev = [np.nanstd(zero_to_nan(d)) for d in mylist]