Search code examples
pythonpandastime-seriescluster-analysissignal-processing

How to split time series in clusters by different patterns?


This is an example of a larger data with many dataframes similar to this one below (df_final):

df1 = pd.DataFrame({"DEPTH (m)":np.arange(0, 2000, 2),
                    "SIGNAL":np.random.uniform(low=-6, high=10, size=(1000,))})        

df2 = pd.DataFrame({"DEPTH (m)":np.arange(2000, 3000, 2),
                    "SIGNAL":np.random.uniform(low=0, high=5, size=(500,))}) 


for i, row in df2.iterrows():
    df2.loc[i, "SIGNAL"] = row["SIGNAL"] * (i / 100)

df_final = pd.concat([df1, df2])

You can see that this signal has two patterns (one "constant" and other increasing):

plt.figure()
plt.plot(df_final["SIGNAL"], df_final["DEPTH (m)"], linewidth=0.5)

plt.ylim(df_final["DEPTH (m)"].max(), df_final["DEPTH (m)"].min())

plt.xlabel("SIGNAL")
plt.ylabel("DEPTH")

enter image description here

Is there a way I can automatically create a flag/cluster to split this signal? In this example I would have one cluster before depth 2000 and other after it.

Another problem is that, in my project, I will have other dataframes with more than two signal patterns and couldn't set it manually for each dataframe as there are many.


Solution

  • One possibility using a rolling standard deviation:

    s1 = df_final.loc[::-1, 'SIGNAL'].rolling(20).std()[::-1]
    s2 = s1.diff()
    
    N = 2 # number of groups
    m = s2.lt(s2.quantile((N-1)/len(df_final)))
    
    groups = (m&~m.shift(fill_value=False)).cumsum()
    
    f, (ax, ax1, ax2) = plt.subplots(ncols=3, sharey=True)
    
    for k, g in df_final.groupby(groups):
        g.plot(x='SIGNAL', y='DEPTH (m)', ax=ax, lw=0.5, label=f'group {k+1}')
    
    ax1.plot(s1, df_final['DEPTH (m)'])
    ax2.plot(s2, df_final['DEPTH (m)'])
        
    ax.invert_yaxis()
    
    ax.set_title('data')
    ax1.set_title('rolling std')
    ax2.set_title('diff')
    

    Output:

    enter image description here