For a list of percentage data, I need to check if the last value (90.2
) is somehow higher and somewhat "abnormal" than the rest of the data. Clearly it is in this sequence.
delivery_pct = [59.45, 55.2, 54.16, 66.57, 68.62, 64.19, 60.57, 44.12, 71.52, 90.2]
But for the below sequnece the last value is not so:
delivery_pct = [ 63.6, 62.64, 60.36, 72.8, 70.86, 40.51, 52.06, 61.47, 51.55, 74.03 ]
How do I check if the last value is abnormally higher than the rest?
About Data: The data point has the range between 0-100%. But since this is percentage of delivery taken for a stock for last 10 days, so it is usually range bound based on nature of stock (highly traded vs less frequently traded), unless something good happens about the stock and there is higher delivery of that stock on that day in anticipation of good news.
Once you've determined a threshold (deviation from mean) you could do this:
import statistics
t = 2 # this is the crucial value
pct = [59.45, 55.2, 54.16, 66.57, 68.62, 64.19, 60.57, 44.12, 71.52, 90.2]
mean = statistics.mean(pct)
tsd = statistics.pstdev(pct) * t
lo = mean - tsd
hi = mean + tsd
print(*[x for x in pct if x < lo or x > hi], sep="\n")
Output:
90.2
It's the threshold value that (effectively) determines what's "abnormal"
The interquartile range (IQR) method produces the same result:
import statistics
pct = [59.45, 55.2, 54.16, 66.57, 68.62, 64.19, 60.57, 44.12, 71.52, 90.2]
spct = sorted(pct)
m = len(spct) // 2
Q1 = statistics.median(spct[:m])
m += len(spct) % 2 # increment m if list length is odd
Q3 = statistics.median(spct[m:])
IQR = Q3 - Q1
lo = Q1 - 1.5 * IQR
hi = Q3 + 1.5 * IQR
print(*[x for x in pct if x < lo or x > hi], sep="\n")
Output:
90.2