Search code examples
arraysbashasciiechoifs

Split text file into array based on an empty line or any non used character


I have a text file which contains text lines separated by an empty line of text. I want to push the content of that file into an array, and use the empty line as a separator. I tried IFS="\n" (or "\r\n" etc..) but couldn't get it to work so instead I thought I would replace any empty line by a character that isn't in the file, so I picked up the spanish inverted question mark (\xBF)

sed 's/^$/'$(echo -e "\xBF")'/'))

So that works, I have a character that I'll use to slice my file and put it into an array.(Bit of a random trick but hey that's just one way of doing it ..)

Now I need to change $IFS so it will use the inverted question mark to slice up the data for the array.

If I type

IFS=$(echo -e "\xBF")

in the command line it works just fine

 echo "$IFS"
¿

But if I type that command with a trailing read -a then it does nothing :

[user@machine ~]$ IFS=$(echo -e "\xBF") read -a array <<< "$var"
[user@machine ~]$ echo "$IFS"
[user@machine ~]$

So that's weird because $var has a value.

Even more surprising, when I verify the value of IFS right after I get :

[user@machine ~]$ echo -n "$IFS" | od -abc
0000000  sp  ht  nl
    040 011 012
         \t  \n
0000003
[user@machine ~]$ 

Which is the default value for IFS.

I am pretty sure one can use any character for IFS, no ?

Alternatively, if you have any trick up your sleeve to split a file in an array with a split based on empty lines I am interested ! (still I'd like to get to the bottom of this for comprehension's sake).

Thanks very much, and have a good week-end :)


Solution

  • First of all, by design, variables set with var=foo command are only made available to command and won't be set for the rest of the script.

    As for your problem, read reads a record until the first delimiter (-d, default: line feed), and then splits that up into fields by $IFS.

    To loop over your items, you can use

    sed -e 's/^$/\xBF/' | while read -d $'\xBF' var
    do
        printf "Value: %s\n-----\n" "$var"
    done
    

    To read them all into an array from a string, you can read up until some character you hopefully don't have, like a NUL byte:

    IFS=$'\xBF' read -d '' -a array <<< "$var"