python pandas dataframe data-processing data-munging

pandas dataframe take rows before certain indexes

I have a dataframe and a list of indexes, and I want to get a new dataframe such that for each index (from the given last), I will take the all the preceding rows that matches in the value of the given column at the index.

      C1 C2 C3
0     1  2  A
1     3  4  A
2     5  4  A
3     7  5  B
4     9  7  C
5     2  3  D
6     1  1  D

The column c3 the indexes (row numbers) 2, 4 , 5 my new dataframe will be:

     C1 C2 C3
0     1  2  A
1     3  4  A
2     5  4  A
4     9  7  C
5     2  3  D

Explanation:

For index 2, rows 0,1,2 were selected because C3 equals in all of them.

For index 4, no preceding row is valid.

And for index 5 also no preceding row is valid, and row 6 is irrelevant because it is not preceding. What is the best way to do so?

Solution

you can make conditions to filter data,if you want just preceding rows match to condition.

ind= 2
col ='C3'
# ".loc[np.arange(ind+1)]" creates indexes till preceding row, so rest of matching conditions can be ignored 
df.loc[df.loc[ind][col] == df[col]].loc[np.arange(ind+1)].dropna()

Out:

   C1   C2  C3
0   1   2   A
1   3   4   A
2   5   4   A

appying on other column

ind= 2
col ='C2'
df.loc[df.loc[ind][col] == df[col]].loc[np.arange(ind+1)].dropna()

Out:

   C1   C2  C3
1   3.0 4.0 A
2   5.0 4.0 A