Search code examples
regexbashshellregex-group

Why does echo not return the same thing as without


I have the following case:

regex: $'\[OK\][[:space:]]+([[:alnum:]_]+)\.([[:alnum:]_]+)([^[]*)'

text:

[OK] AAA.BBBBBB
aaabbbcccdddfffed
asdadadadadadsada
[OK] CCC.KKKKKKK
some text here
[OK] OKO.II

if I am using this site https://regex101.com/r/qw4B5O/1 is going to look like this:

enter image description here

Now... If I will have the following code:

var_test=()
while [[ $text =~ $regex ]]; do
  var_test+=("${BASH_REMATCH[@]:1}")
  text=${text#*"${BASH_REMATCH[0]}"}
done
declare -p var_test

I will have the correct output:

declare -a var_test=([0]="AAA" [1]="BBBBBB" [2]=$'\naaabbbcccdddfffed\nasdadadadadadsada\n' [3]="CCC" [4]="KKKKKKK" [5]=$'\nsome text here\n' [6]="OKO" [7]="II" [8]="")

But once I will convert it into a function like this:

function split_by_regex {
  regex=$1
  text=$2
  groups=()
  while [[ $text =~ $regex ]]; do
    groups+=("${BASH_REMATCH[@]:1}")
    text=${text#*"${BASH_REMATCH[0]}"}
  done
  echo "${groups[@]}"
}

res=($(split_by_regex "$regex" "$text"))
declare -p res

I will get the wrong output:

declare -a res=([0]="AAA" [1]="BBBBBB" [2]="aaabbbcccdddfffed" [3]="asdadadadadadsada" [4]="CCC" [5]="KKKKKKK" [6]="some" [7]="text" [8]="here" [9]="OKO" [10]="II")

After some debug all it the error looks like it comes from the echo "${groups[@]}" because if I will check the groups within the function it looks as it should, but after I get the result from the function is not.

Sorry if this is an obvious question, but I am new to bash and shell scripting and I am trying to figure it out.


Solution

  • For performance reasons, transferring arrays (using nameref, or global) is the most efficient way. In cases where this does not work, possible to use readarray to parse the (standard) output of a sub command into array.

    For simple cases, where the output will NOT contain new lines, one can convert the array into new-line separated output using "printf"

    function foo {
        out=(foo "bar baz" 123 "A B C")
        printf "%s\n" "${out[@]}"
    }
    
    readarray res <<< "$(foo)"
    

    For the general case, when the output may contain new line, possible to use NUL as a separator (similar to -print0 or -0 supported in many GNU utilities), than parse the output with NUL as separator. If NUL does not work, possible to use \1.

    Also single line here document (<<<) can not be used. Seems to be a bug in bash when using <<< with custom delimiter - it append a new line to the text, resulting in extra comment.

    function foo {
        out=(foo "bar baz" 123 $'a\nb' "A B C")
        printf "%s\0" "${out[@]}"
    }
    
    readarray -d $'\0' -t res < <(foo)