Search code examples
pythontime-seriessegment

time-series segmentation in python


I am trying to segment the time-series data as shown in the figure. I have lots of data from the sensors, any of these data can have different number of isolated peaks region. In this figure, I have 3 of those. I would like to have a function that takes the time-series as the input and returns the segmented sections of equal length.

My initial thought was to have a sliding window that calculates the relative change in the amplitude. Since the window with the peaks will have relatively higher changes, I could just define certain threshold for the relative change that would help me take the window with isolated peaks. However, this will create problem when choosing the threshold as the relative change is very sensitive to the noises in the data.

Any suggestions?

Figure : Desired segmentation of the time-series data figure with axes


Solution

  • To do this you need to find signal out of noise.

    1. get mean value of you signal and add some multiplayer that place borders on top and on bottom of noise - green dashed line
    2. find peak values below bottom of noise -> array 2 groups of data
    3. find peak values on top of noise -> array 2 groups of data
    4. get min index of bottom first peak and max index of top of first peak to find first peak range
    5. get min index of top second peak and max index of bottom of second peak to find second peak range

    Some description in code. With this method you can find other peaks. One thing that you need to input by hand is to tell program thex value between peaks for splitting data into parts.

    See graphic for summary.

    import numpy as np
    from matplotlib import pyplot as plt
    
    
    # create noise data
    def function(x, noise):
        y = np.sin(7*x+2) + noise
        return y
    
    def function2(x, noise):
        y = np.sin(6*x+2) + noise
        return y
    
    
    noise = np.random.uniform(low=-0.3, high=0.3, size=(100,))
    x_line0 = np.linspace(1.95,2.85,100)
    y_line0 = function(x_line0, noise)
    x_line = np.linspace(0, 1.95, 100)
    x_line2 = np.linspace(2.85, 3.95, 100)
    x_pik = np.linspace(3.95, 5, 100)
    y_pik = function2(x_pik, noise)
    x_line3 = np.linspace(5, 6, 100)
    
    # concatenate noise data
    x = np.linspace(0, 6, 500)
    y = np.concatenate((noise, y_line0, noise, y_pik, noise), axis=0)
    
    # plot data
    noise_band = 1.1
    top_noise = y.mean()+noise_band*np.amax(noise)
    bottom_noise = y.mean()-noise_band*np.amax(noise)
    fig, ax = plt.subplots()
    ax.axhline(y=y.mean(), color='red', linestyle='--')
    ax.axhline(y=top_noise, linestyle='--', color='green')
    ax.axhline(y=bottom_noise, linestyle='--', color='green')
    ax.plot(x, y)
    
    # split data into 2 signals
    def split(arr, cond):
      return [arr[cond], arr[~cond]]
    
    # find bottom noise data indexes
    botom_data_indexes = np.argwhere(y < bottom_noise)
    # split by visual x value
    splitted_bottom_data = split(botom_data_indexes, botom_data_indexes < np.argmax(x > 3))
    
    # find top noise data indexes
    top_data_indexes = np.argwhere(y > top_noise)
    # split by visual x value
    splitted_top_data = split(top_data_indexes, top_data_indexes < np.argmax(x > 3))
    
    # get first signal range
    first_signal_start = np.amin(splitted_bottom_data[0])
    first_signal_end = np.amax(splitted_top_data[0])
    
    # get x index of first signal
    x_first_signal = np.take(x, [first_signal_start, first_signal_end])
    ax.axvline(x=x_first_signal[0], color='orange')
    ax.axvline(x=x_first_signal[1], color='orange')
    
    # get second signal range
    second_signal_start = np.amin(splitted_top_data[1])
    second_signal_end = np.amax(splitted_bottom_data[1])
    
    # get x index of first signal
    x_second_signal = np.take(x, [second_signal_start, second_signal_end])
    ax.axvline(x=x_second_signal[0], color='orange')
    ax.axvline(x=x_second_signal[1], color='orange')
    
    plt.show()
    

    Output:

    red line = mean value of all data

    green line - top and bottom noise borders

    orange line - selected peak data

    enter image description here