I have the dataset like below :
Category,Date,Id,Amount
Risk A,11/12/2020,1,-10
Risk A,11/13/2020,2,10
Risk A,11/14/2020,3,22
Risk A,11/15/2020,4,32
Total Risk A : 4 ----- needs to be removed
Risk C,11/9/2020,5,43
Risk C,11/10/2020,6,22
Risk C,11/11/2020,7,11
Risk C,11/12/2020,8,-50
Total Risk C : 4 ----- needs to be removed
Risk D,11/12/2020,9,3
Risk D,11/13/2020,10,1
Risk D,11/14/2020,11,3
Risk D,11/15/2020,12,4
Risk D,11/9/2020,13,55
Risk D,11/10/2020,14,32
Total Risk C : 6 ----- needs to be removed
In between the data rows , there are some specific total(summary) rows, which I need to remove from the file. Looking for a better way to remove these rows, without iterating the file line by line in python.As I have few thousand rows and its a time taking to remove some summary lines. Kindly suggest?
You can use Regex to perform string substitution:
import re
t = """Category,Date,Id,Amount
Risk A,11/12/2020,1,-10
Risk A,11/13/2020,2,10
Risk A,11/14/2020,3,22
Risk A,11/15/2020,4,32
Total Risk A : 4 ----- needs to be removed
Risk C,11/9/2020,5,43
Risk C,11/10/2020,6,22
Risk C,11/11/2020,7,11
Risk C,11/12/2020,8,-50
Total Risk C : 4 ----- needs to be removed
Risk D,11/12/2020,9,3
Risk D,11/13/2020,10,1
Risk D,11/14/2020,11,3
Risk D,11/15/2020,12,4
Risk D,11/9/2020,13,55
Risk D,11/10/2020,14,32
Total Risk C : 6 ----- needs to be removed"""
print(re.sub(r'\nTotal.*','', t))
re.sub
will find all the parts of the file that matches the pattern (r'\nTotal.*'
: a newline followhed by the word "Total", followed by any character until the end of line), and replace them with ''.