Search code examples
pythonpandasdataframedata-processingdata-munging

pandas dataframe take rows before certain indexes


I have a dataframe and a list of indexes, and I want to get a new dataframe such that for each index (from the given last), I will take the all the preceding rows that matches in the value of the given column at the index.

      C1 C2 C3
0     1  2  A
1     3  4  A
2     5  4  A
3     7  5  B
4     9  7  C
5     2  3  D
6     1  1  D

The column c3 the indexes (row numbers) 2, 4 , 5 my new dataframe will be:

     C1 C2 C3
0     1  2  A
1     3  4  A
2     5  4  A
4     9  7  C
5     2  3  D

Explanation:

For index 2, rows 0,1,2 were selected because C3 equals in all of them.

For index 4, no preceding row is valid.

And for index 5 also no preceding row is valid, and row 6 is irrelevant because it is not preceding. What is the best way to do so?


Solution

  • you can make conditions to filter data,if you want just preceding rows match to condition.

    ind= 2
    col ='C3'
    # ".loc[np.arange(ind+1)]" creates indexes till preceding row, so rest of matching conditions can be ignored 
    df.loc[df.loc[ind][col] == df[col]].loc[np.arange(ind+1)].dropna()
    

    Out:

       C1   C2  C3
    0   1   2   A
    1   3   4   A
    2   5   4   A
    

    appying on other column

    ind= 2
    col ='C2'
    df.loc[df.loc[ind][col] == df[col]].loc[np.arange(ind+1)].dropna()
    

    Out:

       C1   C2  C3
    1   3.0 4.0 A
    2   5.0 4.0 A