I'm writing a script that converts text from pdf documents and formats it in CSV to be used later on. I've come to an issue where I need to append additional information onto certain lines to complete the data and don't know how to achieve it with sed
. The document looks like so:
# "date","description","cost","total"
"31 01 19","Purchase from SHOP","1.23","1.23"
"Direct debit to COMPANY","2.34","3.57"
"Purchase from SHOP","3.45","7.02"
"01 02 19","Received from PERSON","1.23","5.79"
"Purchase to SHOP","4.56","10.35"
When it should look like this:
# "date","description","cost","total"
"31 01 19","Purchase from SHOP","1.23","1.23"
"31 01 19","Direct debit to COMPANY","2.34","3.57"
"31 01 19","Purchase from SHOP","3.45","7.02"
"01 02 19","Received from PERSON","1.23","5.79"
"01 02 19","Purchase to SHOP","4.56","10.35"
How could I achieve this with sed
?
I have tried:
/^(\"[[:digit:]]{2} [[:digit:]]{2} [[:digit:]]{2}\",)/{
h
N
/^(\"[^\"]*\",\"(0|[1-9][[:digit:]]{,2}(,[[:digit:]]{1,3})*)\.[[:digit:]]{2})\",?{2})/{
G
s/((.*))\n((.*))/\2,\1/
}
}
But that does not seem to do anything, even with the regular expressions tested to ensure they match what I'm after. Am I doing something wrong here or is there a better way to do this?
This might work for you (GNU sed):
sed -E 'N;/\n".. .. .."/!s/^([^,]+,).*\n/&\1/;P;D' file
Append the following line and it does not start with a date, insert the previous lines date, print/delete the previous line and repeat.