Search code examples
pythonstatisticsanalytics

Normalizing a monotonically increasing function and calculate std


I have an increasing function like so:

image

I plan on breaking it up into intervals (between the red lines). I want to rotate the segment horizontal and calculate the standard deviation.

I know this might seem silly, but I essentially want to calculate the variation after normalizing the increasing ramp per segment. What method can I use to achieve this?

My initial thoughts are to take calculate the slope and draw a line from the beginning to the end of the segment with that slope. Then, calculate the delta of each data point with respect to the line.


Solution

  • Just had quick play with writing some code. I grabbed some numbers from your sketched function using https://apps.automeris.io/wpd/, giving me:

    csvdata = """\
    x,y
    22937.2,1822.1
    22942.9,1822.2
    22950.0,1822.4
    22959.6,1822.5
    22967.8,1822.5
    22976.8,1822.6
    22987.1,1822.6
    22995.5,1822.7
    23004.7,1822.7
    23014.1,1822.7
    23025.1,1822.7
    23034.2,1822.8
    23043.1,1822.9
    23049.8,1823.0
    23057.9,1823.2
    23064.0,1823.3
    23070.0,1823.5
    23078.7,1823.6
    23086.6,1823.7
    23096.3,1823.9
    23104.0,1824.0
    23112.9,1824.1
    23122.6,1824.1
    23131.5,1824.1
    23141.3,1824.1
    23153.3,1824.0
    23164.6,1824.1
    """
    

    Then I got that data into Python using Pandas, performed the regression I suggested and plot the output using:

    from io import StringIO
    
    import numpy as np
    import scipy.stats
    import pandas as pd
    import matplotlib.pyplot as plt
    
    # get data
    df = pd.read_csv(StringIO(csvdata))
    
    # basic univariate linear regression
    res = scipy.stats.linregress(df.x, df.y)
    print(res)
    
    # estimate y at each x
    yp = res.intercept + df.x * res.slope
    # calculate standard error
    sd = np.std(yp - df.y)
    
    plt.plot(df.x, df.y, label="function")
    plt.plot(df.x, yp, label="slope that minimises residual")
    plt.title(f"SD of residual = {sd:.2f}")
    plt.legend()
    

    giving me:

    matplotlib output