I need your help on a multiple row entry into different columns. And do the same with all the entries in file.
File example (showing 2 entries only, there are many like these):
>ABC
*
AGA-AUUCUC-CGGUUCAAUCU
|||
UCUAUAACCGCGCCGAGUUAGU
>ABC
*
AGAUAU-GCUGCAGGCUCAAUUG
||||||
UCUAUAACCGCG-CCGAGUUAGU
File format required:
>ABC AGA-AUUCUC-CGGUUCAAUCU UCUAUAACCGCGCCGAGUUAGU
>ABC AGAUAU-GCUGCAGGCUCAAUUG UCUAUAACCGCG-CCGAGUUAGU
I am able to convert single entry into required format by:
tr '\n' '\t' <test3 | awk '{print $1,$3,$5}'
But how do I do it with all entries by reading whole file?
I think you were on the right track with your original awk
solution. Try this; I think it's a good combination of readable and effective:
awk 'BEGIN { RS="\n\n" } ; { print $1, $3, $5 }' < myfile
The idea is to tell awk to treat the blank lines (2 consecutive newlines) as record separators. Then each stanza is treated as a single record, and the whitespace (in this case, single newlines) separates the fields. This is pretty similar to what you were doing with tr
, except now awk will run through the whole file processing a stanza at a time.