Search code examples
pythonpandasdataframecompareseries

Lengths must match to compare


How can I match length of DataFrame when comparing indexes?

df[df.index > df[df.a == 5].index] (shape of df is dynamic)

Example:

df = pd.DataFrame([[0, 10], [5, 10], [0, 10], [5, 10], [0, 10], [0, 10]], columns=["a", "b"])

m = df.index > df[df.a == 5].index
df.loc[m, 'b'] -= np.arange(1, m.sum() + 1)

Desired result:

   a   b
0  0  10
1  5  9
2  0  9
3  5  8
4  0  8
5  0  8

Solution

  • The error happens because the length of df[df.a == 5] does not match the length of df, so following expression:

    df.index > df[df.a == 5].index
    

    is invalid. Both sides has to match in length or one side must be broadcastable, but if the lengths are 5 and 2, that's not possible.

    It seems you want group rows according to the position "a" equals 5 and subtract the group number from "b". In that case, you could use eq + cumsum instead:

    df['b'] -= df['a'].eq(5).cumsum()
    

    Output:

       a   b
    0  0  10
    1  5   9
    2  0   9
    3  5   8
    4  0   8
    5  0   8