Search code examples
regexbashgrep

Having Trouble with Bash RegEx and Grep – Need Assistance


Im trying to write a bash script that uses a RegEx pattern and takes this as input:

  #
  #------------------------------------------- spaceholder ---------------------------------------------------------------------------
  #

  #@E2E-1 @id:1 
  Scenario: Login & Search: B2B_PKG_IN >> BE
    Given I am on the login page
    When I enter <username> and incorrect password multiple times
    Then I should be locked out of my account
    And I should see a lockout message



  #@E2E-2 @id:32 
  Scenario: Login & Search: B2B_PKG_IN >> NL
    Given I am on the login page
    When I enter <username> and incorrect password multiple times
    Then I should be locked out of my account
    And I should see a lockout message


  #
  #------------------------------------------- B2B_PKG_3PA ---------------------------------------------------------------------------


  #

  @E2E-3 @id:3
  Scenario: Login & Search: B2B_PKG_3PA >> BE
    Given I am on the login page
    When I enter <username> and incorrect password multiple times
    Then I should be locked out of my account
    And I should see a lockout message

and I tested it with this chatgpt generated Pattern: ((@[^\n]+|#@[^\n]+)?\s*)?Scenario:[^\n]*\n(?:[^\n]*\n)*?\n

and it works just like how I want and the output looks like this on a RegEx testing Website:

  #@E2E-1 @id:1 
  Scenario: Login & Search: B2B_PKG_IN >> BE
    Given I am on the login page
    When I enter <username> and incorrect password multiple times
    Then I should be locked out of my account
    And I should see a lockout message



  #@E2E-2 @id:32 
  Scenario: Login & Search: B2B_PKG_IN >> NL
    Given I am on the login page
    When I enter <username> and incorrect password multiple times
    Then I should be locked out of my account
    And I should see a lockout message



  @E2E-3 @id:3
  Scenario: Login & Search: B2B_PKG_3PA >> BE
    Given I am on the login page
    When I enter <username> and incorrect password multiple times
    Then I should be locked out of my account
    And I should see a lockout message

so now I tried it with grep in bash like this:

while IFS= read -r -d '' block; do
        current_scenarios+=("$block")
    done < <(grep -Pzo "$pattern" "$current_file")

and for some reason it include the #space holder part, Ive tried grep -E, grep -oP, grep -Po but nothing seems to work, please help


Solution

  • Assuming:

    • Each block starts with the line beginning with #@ or @.
    • The next line in the block starts with the string Scenario:.
    • A block does not contain blank lines in between.
    • A block ends if followed by two or more blank lines (or the end of the file).
    • You want to assign a bash array current_scenarios with the matched blocks.
    • Each line of $current_file looks containing two leading whitespaces, but this is an editing issue and can be ignored.

    Then would you please try:

    pattern="(?sm)^#?@[^\n]+\n*^Scenario:.*?\n(?=\n{2}|\n*\Z)"
    while IFS= read -r -d '' block; do
        current_scenarios+=("$block")
    done < <(grep -Pzo "$pattern" "$current_file")
    
    printf "%s\n\n" "${current_scenarios[@]}"       # just to see the results
    

    Explanation of the regex (?sm)^#?@[^\n]+\n*^Scenario:.*?\n(?=\n{2}|\n*\Z):

    • The (?s) option makes a dot . match a newline.
    • The (?m) option makes ^ and $ match start/end of the line, as well as the start/end of the input string.
    • ^#?@ matches either #@ or @ at the start of the line.
    • [^\n]+\n* consumes current line.
    • ^Scenario: matches the next line.
    • .*?\n matches following lines as short as possible (non-greedy).
    • (?=\n{2}|\n*\Z) is a lookahead assertion which matches two blank lines or the end of the input.