I need help writing a Unix script loop to process the following data:
200250|Wk50|200212|January|20024|Quarter4|2002|2002
|2003-01-12
|2003-01-18
|2003-01-05
|2003-02-01
|2002-11-03
|2003-02-01|
|2003-02-01|||||||
200239|Wk39|200209|October|20023|Quarter3|2002|2002
|2002-10-27
|2002-11-02
|2002-10-06
|2002-11-02
|2002-08-04
|2002-11-02|
|2003-02-01|||||||
I have data in above format in a text file. What I need to do is remove newline characters on all lines which have |
as the first character in the next line. The output I need is:
200250|Wk50|200212|January|20024|Quarter4|2002|2002|2003-01-12|2003-01-18|2003-01-05|2003-02-01|2002-11-03|2003-02-01||2003-02-01|||||||
200239|Wk39|200209|October|20023|Quarter3|2002|2002|2002-10-27|2002-11-02 |2002-10-06|2002-11-02|2002-08-04|2002-11-02||2003-02-01|||||||
I need some help to achieve this. These shell commands are giving me nightmares!
Here is an awk
solution:
$ awk 'substr($0,1,1)=="|"{printf $0;next} {printf "\n"$0} END{print""}' data
200250|Wk50|200212|January|20024|Quarter4|2002|2002|2003-01-12|2003-01-18|2003-01-05|2003-02-01|2002-11-03|2003-02-01||2003-02-01|||||||
200239|Wk39|200209|October|20023|Quarter3|2002|2002|2002-10-27|2002-11-02|2002-10-06|2002-11-02|2002-08-04|2002-11-02||2003-02-01|||||||
Explanation:
Awk implicitly loops through every line in the file.
substr($0,1,1)=="|"{printf $0;next}
If this line begins with a vertical bar, then print it (without a final newline) and then skip to the next line. We are using printf
here, as opposed to the more common print
, so that newlines are not printed unless we explicitly ask for them.
{printf "\n"$0}
If the line didn't begin with a vertical bar, print a newline and then this line (again without a final newline).
END{print""}
At the end of the file, print a newline.
The above prints out an extra newline at the beginning of the file. If that is a problem, then it can be eliminated with just a minor change:
$ awk 'substr($0,1,1)=="|"{printf $0;next} {printf new $0;new="\n"} END{print""}' data
200250|Wk50|200212|January|20024|Quarter4|2002|2002|2003-01-12|2003-01-18|2003-01-05|2003-02-01|2002-11-03|2003-02-01||2003-02-01|||||||
200239|Wk39|200209|October|20023|Quarter3|2002|2002|2002-10-27|2002-11-02|2002-10-06|2002-11-02|2002-08-04|2002-11-02||2003-02-01|||||||