Search code examples
pythonnumpystatisticsminitab

How to compute Minitab-equivalent quartiles using NumPy


I have a homework assignment that I was doing with Minitab to find quartiles and the interquartile range of a data set. When I tried to replicate the results using NumPy, the results were different. After doing some googling, I see that there are many different algorithms for computing quartiles: as listed here. I've tried all the different types of interpolation listed in the NumPy docs for the percentile function but none of them match minitab's algorithm. Is there any lazy solution to achieve the minitab algorithm with NumPy or will I just need to roll out my own code and implement the algorithm?

Sample code:

import pandas as pd
import numpy as np

terrestrial = Series([76.5,6.03,3.51,9.96,4.24,7.74,9.54,41.7,1.84,2.5,1.64])
aquatic = Series([.27,.61,.54,.14,.63,.23,.56,.48,.16,.18])

df = DataFrame({'terrestrial' : terrestrial, 'aquatic' : aquatic})

This is the method I used with NumPy

q75,q25 = np.percentile(df.aquatic, [75,25], interpolation='linear')
iqr = q75 - q25

The results from Minitab are different:

Descriptive Statistics: aquatic, terrestrial 

Variable         Q1      Q3     IQR
aquatic      0.1750  0.5725  0.3975
terrestrial    2.50    9.96    7.46

Solution

  • Here's an attempt to implement Minitab's algorithm. I've written these functions assuming that you've already dropped missing observations from the series a:

    # Drop missing obs
    x = df.aquatic[~ pd.isnull(df.aquatic)]
    
    def get_quartile1(a):
        a = a.sort(inplace=False)
        pos1 = (len(a) + 1) / 4.0
        round_pos1 = int(np.floor((len(a) + 1) / 4.0))
        first_part = a.iloc[round_pos1 - 1]
        extra_prop = pos1 - round_pos1
        interp_part = extra_prop * (a.iloc[round_pos1] - first_part)
        return first_part + interp_part
    
    get_quartile1(x)
    Out[84]: 0.17499999999999999
    
    def get_quartile3(a):
        a = a.sort(inplace=False)
        pos3 = (3 * len(a) + 3) / 4.0
        round_pos3 = round((3 * len(a) + 3) / 4) 
        first_part = a.iloc[round_pos3 - 1]
        extra_prop = pos3 - round_pos3
        interp_part = extra_prop * (a.iloc[round_pos3] - first_part)
        return first_part + interp_part
    
    get_quartile3(x)
    Out[86]: 0.57250000000000001