Search code examples
pythondata-visualizationdata-analysisweighted-average

weighted average from an array in python


I need to form a new sequence of numbers by replacing every data value, starting with the 4th entry and ending with the 4th from the last entry, with a weighted average of the seven points around it, using the following formula:

(y[i-3] + 2y[i-2] + 3y[i-1] + 3y[i] + 3y[i+1] + 2y[i+2] + y[i+3]) // 15 

(NOTE. The i- or i+ some number is a subscript in case that wasn't apparent.)

Here is the code I have which produces a raw graph, but I need to smooth a new graph with the above formula. The data file produces an array of integers set up as [-24, 4, -4, -12, -52...]. I am not even sure where to begin with the formula any help would be appreciated.

from matplotlib import pyplot as plt

with open('2_Record2308.dat', 'r') as f:

    data = [int(x) for x in f]

graph = data

fig, ax = plt.subplots()

ax.plot(graph)

ax.legend()

ax.set_ylabel('Raw')

plt.tight_layout()

plt.show()

Solution

  • This code should do the trick:

    avg = [(sum(y) + sum(y[1:-1]) + sum(y[2:-2])) // 15 
           for y in zip(data[:-6], data[1:-5], data[2:-4], data[3:-3], data[4:-2], data[5:-1], data[6:])] 
    

    Here zip(data[:-6], data[1:-5], ...) creates the successive 7-tuples.

    And sum(y) takes the 7 numbers each once. sum(y[1:-1]) takes the 5 inner numbers once again. sum(y[2:-2]) takes the 3 inner numbers a third time.

    By the way, adding 7 before dividing by 15 would be closer to averaging. In the original formulation the average always gets rounded downwards.

    So, I would suggest (sum(y) + sum(y[1:-1]) + sum(y[2:-2]) + 7) // 15

    Here is a test based on your code and random-walk data.

    from matplotlib import pyplot as plt
    import random
    
    def do_averaging_7(data):
        return [(sum(y) + sum(y[1:-1]) + sum(y[2:-2]) + 7) // 15
                for y in zip(data[:-6], data[1:-5], data[2:-4], data[3:-3], data[4:-2], data[5:-1], data[6:])]
    
    data = [random.randrange(-100,101) for _ in range(100)]
    for i in range(1,len(data)):
        data[i] += data[i-1]
    avg = do_averaging_7(data)
    
    fig, ax = plt.subplots()
    ax.plot(range(len(data)), data, "blue")
    ax.plot(range(3, 3+len(avg)), avg, color="red")
    ax.set_ylabel('Raw')
    plt.tight_layout()
    plt.show()
    

    Resulting plot: resulting plot