Search code examples
pythonstatisticsstatsmodelsrobust

Why the MAD that is calculated with the function scipy.stats.median_absolute_deviation it's different from the function i did?


Next I present the code made, I create the DMA function which has the formula of the absolute mean deviation, the other two print's compute the DMA of the stats package and robust, as we see both results are different, I do not understand why the estimation of the functions differs so much from the one I manually create

  import numpy as np
  import scipy.stats as stats
  from statsmodels import robust    



def MAD (vector):
      MAD  = np.sum(np.abs(vector-np.mean(vector)))/len(vector)
      return(MAD )

    print("MAD ",DMA([1.5,0,4,2.5]))
    print("MAD function from stats", stats.median_absolute_deviation([1.5,0,4,2.5],axis=0))
    print("MAD function from robust", robust.mad([1.5,0,4,2.5]))

Results:

MAD 1.25
MAD function from stats 1.8532499999999998
MAD function from robust 1.8532527731320025

Solution

  • First, both functions apply a normalization constant to make the MAD a consistent estimator of standard deviation. If we turn this tweak off by setting this factor to 1.0, the results are identical.

    Second, while the median and mean of this particular vector are the same, you should use the median of the vector as the center instead of the mean if you'd like to match the default behavior of these two functions.

    import numpy as np
    import scipy.stats as stats
    from statsmodels import robust    
    
    def MAD(vector):
        MAD = np.mean(np.abs(vector-np.median(vector)))
        return MAD
    
    print("MAD",MAD([1.5,0,4,2.5]))
    print("MAD function from stats", stats.median_absolute_deviation([1.5,0,4,2.5],axis=0,scale=1.0))
    print("MAD function from robust", robust.mad([1.5,0,4,2.5],c=1.0))
    

    MAD 1.25

    MAD function from stats 1.25

    MAD function from robust 1.25