I have a dataframe whereby if a condition is satisfied, it looks a couple rows above to find a specific row with another string and replaces itself with that entire cell which had the string.
Here is an Example:
If "test" is detected in the type column, it looks up to 3 rows above for a row which contains the string "Subject". If that condition is satisfied, it then changes it's cell name to the entire cell it found.
The data is :
Type Score
Ignore Ignore
Subject_English NA
test 58
test 6
test 87
Subject_science NA
test 23
test 47
test 89
test 45
test 23
ignore ignore
ignore ignore
ignore ignore
test 54
Subject_Math NA
Ignore ignore
Ignore ignore
test 88
Lastly, if multiple rows satisfy the initial condition it takes the latest. Secondly, once a cell has been changed, that cell can be used to satisfy the first condition for another cell. Thirdly, there might be a few rows in between the "test" row and the row Im looking to satisfy the condition I need to ignore. Further notes: The number of tests can vary, it could be 3 or 10. You can have ignores between the subjects. If possible, I'd like to add a threshold that if there wasn't a subject in the last 10 rows, then leave it as test
How to look back a few rows higher to see if a condition is satisfied and then replace a cell accordingly
Trying to "look back" is certainly possible, but it adds a lot of complexity. Instead, iterate over the rows from the top down. Whenever you come across a cell whose value starts with "Subject", set a variable current_subject
to that value. Whenever you come across a cell whose value is "test", set the value of that cell to the value in current_subject
.
I'd like to add a threshold that if there wasn't a subject in the last 10 rows, then leave it as test
OK, so create another variable called nonsubject_rows
. Set it to 0 whenever you come across a subject row, and perhaps also whenever you set a test row to the current subject; otherwise, increment it for each row you visit. Before you set the value of a test row, check to see whether nonsubject_rows
is greater than the threshold; if it is, don't change the row.
In short, you can keep track of the information you need in order to process each row as you go. That will be a lot simpler and faster than searching for the information every time you need it.