python pandas numpy scipy signal-processing

How to find series of highest peaks of a repeating pattern using find_peaks() in Python?

I'm trying to determine the highest peaks of the pattern blocks in the following waveform:

Basically, I need to detect the following peaks only (highlighted):

If I use scipy.find_peaks(), it's unable to detect the appropriate peaks:

indices = find_peaks(my_waveform, prominence = 1)[0]

It ends up detecting all of the following points, which is not what I am looking for:

I can't provide the input arguments of distance or height thresholds to scipy.find_peaks() since there are many of the desired peaks on either extremes which are lower in height than the non-desired peaks in the middle.

Note: I had de-trended the waveform in order to aid this above problem too as you can see in the above snapshot, but it still doesn't give the right results.

So can anyone help with a correct way to tackle this?

Here's the code to fully reproduce the dataset I've shown ("autocorr" is the final waveform of interest)

import json
import sys, os
import numpy as np
import pandas as pd
import glob
import pickle

from statsmodels.tsa.stattools import adfuller, acf, pacf
from scipy.signal import find_peaks, square
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import matplotlib.pyplot as plt

#GENERATION OF A FUNCTION WITH DUAL SEASONALITY & NOISE

def white_noise(mu, sigma, num_pts):
    """ Function to generate Gaussian Normal Noise
    Args:
        sigma: std value
        num_pts: no of points
        mu: mean value

    Returns:
        generated Gaussian Normal Noise
    """
    
    noise = np.random.normal(mu, sigma, num_pts)
    return noise

def signal_line_plot(input_signal: pd.Series, title: str = "", y_label: str = "Signal"):
    """ Function to plot a time series signal
    Args:
        input_signal: time series signal that you want to plot
        title: title on plot
        y_label: label of the signal being plotted
        
    Returns:
        signal plot
    """
    
    plt.plot(input_signal)
    plt.title(title)
    plt.ylabel(y_label)
    plt.show()

t_week = np.linspace(1,480, 480)
t_weekend=np.linspace(1,192,192)
T=96 #Time Period
x_weekday = 10*square(2*np.pi*t_week/T, duty=0.7)+10 + white_noise(0, 1,480)
x_weekend = 2*square(2*np.pi*t_weekend/T, duty=0.7)+2 + white_noise(0,1,192)
x_daily_weekly = np.concatenate((x_weekday, x_weekend)) 
x_daily_weekly_long = np.concatenate((x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly,x_daily_weekly))
signal_line_plot(x_daily_weekly_long)
signal_line_plot(x_daily_weekly_long[0:1000])

#x_daily_weekly_long is the final waveform on which I'm carrying out Autocorrelation

#PERFORMING AUTOCORRELATION:
import scipy.signal as signal

autocorr = signal.correlate(x_daily_weekly_long, x_daily_weekly_long, mode = "same")
lags = signal.correlation_lags(len(x_daily_weekly_long), len(x_daily_weekly_long), mode = "same")

#VISUALIZATION:
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags, autocorr)

Solution

As you have some kind of double (or even triple) signal, I would attempt a double smoothing.

One to remove the overall trend, and one to remove the sharp noise.

A picture is probably better than a long explanation:

from scipy.signal import find_peaks
import pandas as pd
import numpy as np

def smooth(s, win):
    return pd.Series(s).rolling(window=win, center=True).mean().ffill().bfill()

plt.plot(lags, autocorr, label='data')

WINDOW = 100   # needs to be determined empirically
               # and so are the multipliers below

# double smoothing difference + clipping
ddiff = np.clip(smooth(autocorr, 2*WINDOW)-smooth(autocorr, 10*WINDOW), 0, np.inf)
plt.plot(lags, ddiff, label='smooth+clip')

peaks = find_peaks(ddiff, width=WINDOW)[0]
plt.plot(lags[peaks], autocorr[peaks], marker='o', ls='')
plt.plot(lags[peaks], ddiff[peaks], marker='o', ls='')

plt.legend()

output:

smoothing the original signal

As often in data analysis, the earlier you perform a transformation might be the better. You could also clean your original signal before running the autocorrelation. Here is a quick example (using the smooth function defined above):

from scipy.signal import find_peaks

x2 = smooth(x_daily_weekly_long, 100)
autocorr2 = signal.correlate(x2, x2, mode = "same")

plt.plot(lags, autocorr2)
idx = find_peaks(autocorr2)[0]
plt.plot(lags[idx], autocorr2[idx], marker='o', ls='')

cleaned signal: