Search code examples
regexlinuxshellshregex-group

How to split by regex in shell script


I have the following output example:

[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II

and I want to parse it via a this regex \[OK\](\s+\w+)\.(\w+)\n([^\[]+)

enter image description here

but when I am trying to create my shell script which looks like this:

#!/bin/bash

# Define the text to parse
text="[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II"

# Create an empty list to hold the group lists
# Loop through the text and extract all matches
regex_pattern="\[OK\](\s+\w+)\.(\w+)\n([^\[]+)"
while [[ $text =~ $regex_pattern ]]; do
  # Create a list to hold the current groups
  echo "Matched_1: ${BASH_REMATCH[1]}"
  echo "Matched_2: ${BASH_REMATCH[2]}"
  echo "Matched_3: ${BASH_REMATCH[3]}"
  echo "-------------------"
done

Is not going to output anything...


Solution

  • Bash does not do global matching.

    But what you can do: if there's a match then remove the prefix ending in the matched text from the text string.

    text="[OK] AAA.BBBBBB
    aaabbbcccdddfffed
    asdadadadadadsada
    [OK] CCC.KKKKKKK
    some text here
    [OK] OKO.II"
    
    re=$'\[OK\][[:space:]]+([[:alnum:]_]+)\.([[:alnum:]_]+)([^[]*)'
    #                  no newline characters in the regex  ^^^^^^^
    
    while [[ $text =~ $re ]]; do
        # output the match info
        declare -p BASH_REMATCH
        # and remove the matched text from the start of the string
        # (don't forget the quotes here!)
        text=${text#*"${BASH_REMATCH[0]}"}
    done
    

    outputs

    declare -a BASH_REMATCH=([0]=$'[OK] AAA.BBBBBB\naaabbbcccdddfffed\nasdadadadadadsada\n' [1]="AAA" [2]="BBBBBB" [3]=$'\naaabbbcccdddfffed\nasdadadadadadsada\n')
    declare -a BASH_REMATCH=([0]=$'[OK] CCC.KKKKKKK\nsome text here\n' [1]="CCC" [2]="KKKKKKK" [3]=$'\nsome text here\n')
    declare -a BASH_REMATCH=([0]="[OK] OKO.II" [1]="OKO" [2]="II" [3]="")
    

    Clearly, this destroys the $text variable, so make a copy if you need it after the loop.

    The regex makes the solution a bit fragile: there cannot be any open brackets in the "following" lines.


    Having said all that, this is not what bash is really good for. I'd use awk or perl for this task.