Search code examples
pythonpandasconditional-statementsmask

Complex mask for dataframe


I have a dataframe with a time series in one single column. The data looks like this chart

input

I would like to create a mask that is TRUE each time that the data is equal or lower than -0.20. It should also be TRUE before reaching -0.20 while negative. It should also be true after reaching -0.20 while negative. This version of the chart

output

is my manual attempt to show (in red) the values where the mask would be TRUE. I started creating the mask but I could only make it equal to TRUE while the data is less than -0.20 mask = (df['data'] < -0.2). I couldn't do any better, does anybody know how to achieve my goal?


Solution

  • One approach could be to group segments that are entirely below zero, and then for each group verify whether or not there any values below -0.2.

    enter image description here

    See below for a full reproducible example script:

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    
    
    np.random.seed(167)
    
    df = pd.DataFrame(
        {"y": np.cumsum([np.random.uniform(-0.01, 0.01) for _ in range(10 ** 5)])}
    )
    plt.plot(df)
    
    gt_zero = df["y"] < 0
    regions = (gt_zero != gt_zero.shift()).cumsum()
    
    # here's your interesting DataFrame with the specified mask
    df_interesting = df.groupby(regions).filter(lambda s: s.min() < -0.2)
    
    # plot individual regions
    for i, grp in df.groupby(regions):
        if grp["y"].min() < -0.2:
            plt.plot(grp, color="tab:red", linewidth=5, alpha=0.6)
    
    plt.axhline(0, linestyle="--", color="tab:gray")
    plt.axhline(-0.2, linestyle="--", color="tab:gray")
    plt.show()