python pandas time-series cluster-analysis signal-processing

How to split time series in clusters by different patterns?

This is an example of a larger data with many dataframes similar to this one below (df_final):

df1 = pd.DataFrame({"DEPTH (m)":np.arange(0, 2000, 2),
                    "SIGNAL":np.random.uniform(low=-6, high=10, size=(1000,))})        

df2 = pd.DataFrame({"DEPTH (m)":np.arange(2000, 3000, 2),
                    "SIGNAL":np.random.uniform(low=0, high=5, size=(500,))}) 


for i, row in df2.iterrows():
    df2.loc[i, "SIGNAL"] = row["SIGNAL"] * (i / 100)

df_final = pd.concat([df1, df2])

You can see that this signal has two patterns (one "constant" and other increasing):

plt.figure()
plt.plot(df_final["SIGNAL"], df_final["DEPTH (m)"], linewidth=0.5)

plt.ylim(df_final["DEPTH (m)"].max(), df_final["DEPTH (m)"].min())

plt.xlabel("SIGNAL")
plt.ylabel("DEPTH")

Is there a way I can automatically create a flag/cluster to split this signal? In this example I would have one cluster before depth 2000 and other after it.

Another problem is that, in my project, I will have other dataframes with more than two signal patterns and couldn't set it manually for each dataframe as there are many.

Solution

One possibility using a rolling standard deviation:

s1 = df_final.loc[::-1, 'SIGNAL'].rolling(20).std()[::-1]
s2 = s1.diff()

N = 2 # number of groups
m = s2.lt(s2.quantile((N-1)/len(df_final)))

groups = (m&~m.shift(fill_value=False)).cumsum()

f, (ax, ax1, ax2) = plt.subplots(ncols=3, sharey=True)

for k, g in df_final.groupby(groups):
    g.plot(x='SIGNAL', y='DEPTH (m)', ax=ax, lw=0.5, label=f'group {k+1}')

ax1.plot(s1, df_final['DEPTH (m)'])
ax2.plot(s2, df_final['DEPTH (m)'])
    
ax.invert_yaxis()

ax.set_title('data')
ax1.set_title('rolling std')
ax2.set_title('diff')

Output: