Search code examples
linuxbashcsh

How do I remove all lines starting from the beginning until I reach a certain pattern, except from the last one


Example:

>"one"
>"two"
>"three"
>"title"
>12 23 14
>...

I want to remove all lines at the beginning until I reach the one in which NF==3 (awk), but the line named "title", and just once at the beginning of the file, not repeatedly.

Thank you

Expected output:

>"title"
>12 23 14
>...

Solution

  • The way to do this is by using awk as you already suggested. As you say, you want to print the lines starting from the first occurrence where you have 3 fields, this can easily be done by setting a print flag (let's call it p)'

    awk '(NF==3){p=1};p' file
    

    This will print everything starting from the first line with 3 fields.

    However, you would also like to print the line which contains the string "title". This can be done by matching this string :

    awk '/title/{print}(NF==3){p=1};p' file
    

    The problem with this is that it is possible that the word 'title' will be printed twice when your file looks like

    a          < not printed
    title      < printed
    a b c      < printed
    title      < printed twice
    e f g      < printed
    h          < printed
    

    So you have to be a bit more careful here with your logic and place the check together with the check when to print:

    awk '(NF==3){p=1};(p || /title/)' file
    

    This again is not robust because you might have a file like:

    a          < not printed
    title 1    < printed
    b          < not printed
    title 2    < printed
    a b c      < printed
    h          < printed
    

    and you only want "title 2" to be printed:

    awk '/title/{s=$0}(NF==3){p=1;print s};p' file
    

    If the "title" just refers to the line before the first line with 3 fields, then you do

    awk '(NF==3){p=1;print s};p;{s=$0}' file
    

    or for a minor speedup:

    awk '(NF==3){p=1;print s};p{print; next}{s=$0}' file