Search code examples
bashawkhead

Use awk to print custom lines


I have a file like this:

>ref
AAAAAAA
>seq1
BBBBBBB
>seq2
CCCCCCC
>seq3
DDDDDD

I want to get:

>ref
AAAAAAA
>seq1
BBBBBBB
>ref
AAAAAAA
>seq2
CCCCCCC
>ref
AAAAAAA
>seq3
DDDDDD

I was thinking of using this command in bash:

ref=$(head -n 2 file)
awk '/>/{print "'"$ref"'"}1' file

And here is what I get:

awk: non-terminated string >ref... at source line 2
 context is
    />/{print ">ref >>> 
 <<< 

Any idea of what is happening? :) Thanks a lot!


Edit: I would like to use this pipeline for many files all starting with a different ref: ref1 for file1, ref2 for file2,... and was thus thinking of using head to store each ref in a variable to use it for each file :)


Solution

  • The problem

    The problem is that when ref has the value

    >ref
    AAAAAA
    

    your awk call

    awk '/>/{print "'"$ref"'"}1' file
    

    ends up as

    awk '/>/{print ">ref
    AAAAAA"}1' file
    

    after shell expansion. Awk does not allow newlines in string literals, so this explodes. If the first two lines of your file were

    >ref"
    print "AAAAA
    

    it would work (except there would be fluff at the top), but that does not help us find a sane solution.

    Solution in awk

    A way to fix this with awk is to assemble ref in awk itself:

    awk 'NR <= 2 { ref = ref $0 ORS; next } />/ { $0 = ref $0 } 1' filename
    

    That is

    NR <= 2 {                # First two lines:
      ref = ref $0 ORS       # build ref string (ORS is "\n" by default)
      next                   # and stop there
    }
    />/ {                    # after that: For lines that contain a >
      $0 = ref $0            # prepend ref
    }
    1                        # then print
    

    Solution in sed

    Actually I rather like sed for this one:

    sed '1h; 2H; 1,2d; />/{ x; p; x; }' filename
    

    That is:

    1h                # first line: save to hold buffer
    2H                # second line: append to hold buffer
    1,2d              # first two lines: stop here
    />/ {             # after that: If line contains >
      x               # swap hold buffer, pattern space
      p               # print what used to be in the hold buffer (the first
                      # two lines that we saved above)
      x               # swap back
    }
                      # when we drop off the end, the original line will be
                      # printed.