I want to find similar shapes in time series using stumpy. However there seems to be some kind of special treatment of values that I do not understand.
Let me give you an example:
import numpy as np
import matplotlib.pyplot as plt
import stumpy
ssss=105*np.ones(800)
ssss[:50]=100
m = 210
mp = stumpy.stump(ssss, m=m)
plt.plot(ssss, color="blue")
plt.plot(mp[:,0], color="orange")
Results in
But clearly there are many parts where there is a perfect match after the jump, so the orange line, the distance, should be 0. Why is that not the case?
Surprisingly, if you change the 100 to 101 you get the result you would expect:
import numpy as np
import matplotlib.pyplot as plt
import stumpy
ssss=105*np.ones(800)
ssss[:50]=101
m = 210
mp = stumpy.stump(ssss, m=m)
plt.plot(ssss, color="blue")
plt.plot(mp[:,0], color="orange")
What is an explanation for that?
I tried your first code using the latest development version.
import numpy as np
import matplotlib.pyplot as plt
import stumpy
T = 105 * np.ones(800)
T[:50] = 100
m = 210
mp = stumpy.stump(T, m=m)
plt.plot(T, color="blue")
plt.plot(mp[:,0], color="orange")
plt.show()
and I get this:
which makes sense I think. So, I suggest that you start fresh and install stumpy again and check the result.
Regarding your second question:
If I change 100
to 101
, I get the same figure as above [which makes sense I believe when the param normalize
is set to True (default)]