I have a file like this:
>ref
AAAAAAA
>seq1
BBBBBBB
>seq2
CCCCCCC
>seq3
DDDDDD
I want to get:
>ref
AAAAAAA
>seq1
BBBBBBB
>ref
AAAAAAA
>seq2
CCCCCCC
>ref
AAAAAAA
>seq3
DDDDDD
I was thinking of using this command in bash:
ref=$(head -n 2 file)
awk '/>/{print "'"$ref"'"}1' file
And here is what I get:
awk: non-terminated string >ref... at source line 2
context is
/>/{print ">ref >>>
<<<
Any idea of what is happening? :) Thanks a lot!
Edit: I would like to use this pipeline for many files all starting with a different ref: ref1
for file1
, ref2
for file2
,... and was thus thinking of using head
to store each ref
in a variable to use it for each file :)
The problem is that when ref
has the value
>ref
AAAAAA
your awk call
awk '/>/{print "'"$ref"'"}1' file
ends up as
awk '/>/{print ">ref
AAAAAA"}1' file
after shell expansion. Awk does not allow newlines in string literals, so this explodes. If the first two lines of your file were
>ref"
print "AAAAA
it would work (except there would be fluff at the top), but that does not help us find a sane solution.
A way to fix this with awk is to assemble ref
in awk itself:
awk 'NR <= 2 { ref = ref $0 ORS; next } />/ { $0 = ref $0 } 1' filename
That is
NR <= 2 { # First two lines:
ref = ref $0 ORS # build ref string (ORS is "\n" by default)
next # and stop there
}
/>/ { # after that: For lines that contain a >
$0 = ref $0 # prepend ref
}
1 # then print
Actually I rather like sed
for this one:
sed '1h; 2H; 1,2d; />/{ x; p; x; }' filename
That is:
1h # first line: save to hold buffer
2H # second line: append to hold buffer
1,2d # first two lines: stop here
/>/ { # after that: If line contains >
x # swap hold buffer, pattern space
p # print what used to be in the hold buffer (the first
# two lines that we saved above)
x # swap back
}
# when we drop off the end, the original line will be
# printed.