Search code examples
pythonstumpymatrix-profile

Why is the behaviour of stumpy.stump changing so abruptly? Why is it unable to match constant intervals as the same shape?


I want to find similar shapes in time series using stumpy. However there seems to be some kind of special treatment of values that I do not understand.

Let me give you an example:

import numpy as np
import matplotlib.pyplot as plt
import stumpy
ssss=105*np.ones(800)
ssss[:50]=100
m = 210
mp = stumpy.stump(ssss, m=m)

plt.plot(ssss, color="blue")
plt.plot(mp[:,0], color="orange")

Results in

wrong distance plot

But clearly there are many parts where there is a perfect match after the jump, so the orange line, the distance, should be 0. Why is that not the case?

Surprisingly, if you change the 100 to 101 you get the result you would expect:

import numpy as np
import matplotlib.pyplot as plt
import stumpy
ssss=105*np.ones(800)
ssss[:50]=101
m = 210
mp = stumpy.stump(ssss, m=m)

plt.plot(ssss, color="blue")
plt.plot(mp[:,0], color="orange")

distance correct

What is an explanation for that?


Solution

  • I tried your first code using the latest development version.

    import numpy as np
    import matplotlib.pyplot as plt
    import stumpy
    
    T = 105 *  np.ones(800)
    T[:50] = 100
    m = 210
    
    mp = stumpy.stump(T, m=m)
    
    plt.plot(T, color="blue")
    plt.plot(mp[:,0], color="orange")
    plt.show()
    

    and I get this:

    enter image description here

    which makes sense I think. So, I suggest that you start fresh and install stumpy again and check the result.

    Regarding your second question: If I change 100 to 101, I get the same figure as above [which makes sense I believe when the param normalize is set to True (default)]