Search code examples
pythonstatisticscontrolstrend

Method to identify the point where numbers fall off sharply


I have a series of numbers:

numbers = [100, 101, 99, 102, 99, 98, 100,  97.5, 98, 99, 95, 93, 90, 85, 80]

plot of numbers

It's very to see by eye that the numbers start to fall sharply roughly around 10, but is there a simple way to identify that point (or close to it) on the x axis?

This is being done in retrospect, so you can use the entire list of numbers to select the x axis point where the dropoff accelerates.

Python solutions are preferred, but pseudo-code or a general methodology is fine too.


Solution

  • Ok, this ended up fitting my needs. I calculate a running mean, std deviation, and cdf from a t distribution to tell me how unlikely each successive value is.

    This only works with decreases since I am only checking for cdf < 0.05 but it works very well.

    import numpy as np
    from scipy import stats
    import matplotlib.pyplot as plt
    
    numbers = np.array([100, 101, 99, 102, 99, 98, 100,  97.5, 98, 99, 95, 93, 90, 85, 80])
    
    # Calculate a running mean
    cum_mean = numbers.cumsum() / (np.arange(len(numbers)) + 1)
    
    # Calculate a running standard deviation
    cum_std = np.array([numbers[:i].std() for i in range(len(numbers))])
    
    # Calculate a z value 
    cum_z =  (numbers[1:] - cum_mean[:-1]) / cum_std[:-1]
    
    # Add in NA vals to account for records without sample size
    z_vals = np.concatenate((np.zeros(1+2), cum_z[2:]), axis=0)
    
    # Calculate cdf 
    cum_t = np.array([stats.t.cdf(z, i) for i, z in enumerate(z_vals)])
    
    # Identify first number to fall below threshold
    first_deviation = np.where(cum_t < 0.05)[0].min()
    
    fig, ax = plt.subplots()
    
    # plot the numbers and the point immediately prior to the decrease
    ax.plot(numbers)
    ax.axvline(first_deviation-1, color='red')
    

    numbers with drop detected