bashawk

How to make a regex skip a middle section?


I have files with lines such as these:

@Dacor 125#Apples were stored in Section 1.#Delivered on 02/03/2023. All ok.#

I am trying to develop a regex so I use gsub to produce:

Apples were stored in Section 1

However, I don't know how I can use regex to skip a middle section sandwiched between two #, even if I treat # as a delimiter.

So far, I have tried:

awk 'match($0, /@([^#]+)#(.*)#/, arr) {print arr[2]}'

This generates:

Apples were stored in Section 1.#Delivered on 02/03/2023. All ok.

I am unable to get the correct output.


Solution

  • Method 1:

    Use cut to print the second column in #-delimited file:

    cut -f2 -d'#' in_file > out_file
    

    Method 2:

    Use GNU grep:

    grep -Po '^[^#]*#\K[^#]*' in_file > out_file
    

    Here, GNU grep uses the following options:
    -P : Use Perl regexes.
    -o : Print the matches only (1 match per line), not the entire lines.

    ^[^#]*# : beginning of the line, then 0 or more non-# characters, followed by literal #.
    \K : Cause the regex engine to "keep" everything it had matched prior to the \K and not include it in the match. Specifically, ignore the preceding part of the regex when printing the match.

    See also: