Search code examples
pythonstatistics

Python - Check if the last value in a sequence is relatively higher than the rest


For a list of percentage data, I need to check if the last value (90.2) is somehow higher and somewhat "abnormal" than the rest of the data. Clearly it is in this sequence.

delivery_pct = [59.45, 55.2, 54.16, 66.57, 68.62, 64.19, 60.57, 44.12, 71.52, 90.2]

But for the below sequnece the last value is not so:

delivery_pct = [ 63.6, 62.64, 60.36, 72.8, 70.86, 40.51, 52.06, 61.47, 51.55, 74.03 ]

How do I check if the last value is abnormally higher than the rest?

About Data: The data point has the range between 0-100%. But since this is percentage of delivery taken for a stock for last 10 days, so it is usually range bound based on nature of stock (highly traded vs less frequently traded), unless something good happens about the stock and there is higher delivery of that stock on that day in anticipation of good news.


Solution

  • Once you've determined a threshold (deviation from mean) you could do this:

    import statistics
    
    t = 2 # this is the crucial value
    
    pct = [59.45, 55.2, 54.16, 66.57, 68.62, 64.19, 60.57, 44.12, 71.52, 90.2]
    
    mean = statistics.mean(pct)
    tsd = statistics.pstdev(pct) * t
    
    lo = mean - tsd
    hi = mean + tsd
    
    print(*[x for x in pct if x < lo or x > hi], sep="\n")
    

    Output:

    90.2
    

    It's the threshold value that (effectively) determines what's "abnormal"

    The interquartile range (IQR) method produces the same result:

    import statistics
    
    pct = [59.45, 55.2, 54.16, 66.57, 68.62, 64.19, 60.57, 44.12, 71.52, 90.2]
    spct = sorted(pct)
    m = len(spct) // 2
    
    Q1 = statistics.median(spct[:m])
    m += len(spct) % 2 # increment m if list length is odd
    Q3 = statistics.median(spct[m:])
    
    IQR = Q3 - Q1
    
    lo = Q1 - 1.5 * IQR
    hi = Q3 + 1.5 * IQR
    
    print(*[x for x in pct if x < lo or x > hi], sep="\n")
    

    Output:

    90.2