I have a dataframe and a list of indexes, and I want to get a new dataframe such that for each index (from the given last), I will take the all the preceding rows that matches in the value of the given column at the index.
C1 C2 C3
0 1 2 A
1 3 4 A
2 5 4 A
3 7 5 B
4 9 7 C
5 2 3 D
6 1 1 D
The column c3 the indexes (row numbers) 2, 4 , 5 my new dataframe will be:
C1 C2 C3
0 1 2 A
1 3 4 A
2 5 4 A
4 9 7 C
5 2 3 D
Explanation:
For index 2, rows 0,1,2 were selected because C3 equals in all of them.
For index 4, no preceding row is valid.
And for index 5 also no preceding row is valid, and row 6 is irrelevant because it is not preceding. What is the best way to do so?
you can make conditions to filter data,if you want just preceding rows match to condition.
ind= 2
col ='C3'
# ".loc[np.arange(ind+1)]" creates indexes till preceding row, so rest of matching conditions can be ignored
df.loc[df.loc[ind][col] == df[col]].loc[np.arange(ind+1)].dropna()
Out:
C1 C2 C3
0 1 2 A
1 3 4 A
2 5 4 A
appying on other column
ind= 2
col ='C2'
df.loc[df.loc[ind][col] == df[col]].loc[np.arange(ind+1)].dropna()
Out:
C1 C2 C3
1 3.0 4.0 A
2 5.0 4.0 A