Search code examples
pythonpandastime-series

Does ruptures index from 0 or 1?


I use the ruptures module to search for changes in trends, etc. It returns an index value 1 greater than the date length, and it looks like it is indexing from 1 instead of 0. Or it is adding 1 to the last value.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt  # for display purposes
import ruptures as rpt  # our package

n_samples, n_dims, sigma = 1000, 3, 2
n_bkps = 4  # number of breakpoints
signal, bkps = rpt.pw_constant(n_samples, n_dims, n_bkps, noise_std=sigma)

print(len(singal))

bkps = [217, 424, 629, 810, 1000]

I don't understand, do I just need to delete the last value, or do I need to make -1 for all values here?


Solution

  • From what I understand, bkps represent the index of the point that is just after the breakpoint. If you have a range index this means that the breakpoint is in between this point and the previous one.

    Indeed, if we manually plot the breakpoint we need to subtract 0.5 to the x-value to align it to the graph:

    n_samples, n_dims, sigma = 100, 2, 2
    n_bkps = 4  # number of breakpoints
    signal, bkps = rpt.pw_constant(n_samples, n_dims, n_bkps, noise_std=sigma, seed=0)
    
    (f, axes) = rpt.display(signal, bkps)
    
    for x in bkps:
        for ax in axes:
            ax.axvline(x-0.5, ls=':', c='k')
    

    Output:

    enter image description here