Search code examples
pythonpython-3.xnumpystatisticspython-internals

Applying statistics methods on numpy arrays: unexpected results


Please explain.

import statistics
x = [0,1]
statistics.mean(x) 
## 0.5

But:

import numpy 
import statistics
x = numpy.array([0,1])
statistics.mean(x) 
## 0

I'm pretty sure it's a basic, well-known, over-discussed issue: please link to a duplicate, as I couldn't find one.


Solution

  • The reason is there is a conversion method in the statistics module which checks if a data type is a subclass of int. This works for int, but not for np.int32.

    import statistics
    from fractions import Fraction
    
    a = statistics._convert(Fraction('1/2'), int)       # 0.5
    b = statistics._convert(Fraction('1/2'), np.int32)  # 0
    
    def _convert(value, T):
        """Convert value to given numeric type T."""
        if type(value) is T:
            return value
    
        #### THIS BIT WORKS FOR int BUT not for np.int32 ###
        if issubclass(T, int) and value.denominator != 1:
            T = float
    
        try:
            return T(value)
        except TypeError:
            if issubclass(T, Decimal):
                return T(value.numerator)/T(value.denominator)
            else:
                raise
    

    Therefore, you can either use statistics with a list, or numpy with an array:

    1. Use statistics.mean([0, 1]); or
    2. Use np.mean(np.array([0, 1])), or np.array([0, 1]).mean().