Search code examples
linuxbashshellwhitespaceifs

How do I stop `read` with `IFS` from merging together whitespace characters?


Take this piece of code that reads in data separated by |

DATA1="Andreas|Sweden|27"
DATA2="JohnDoe||30"   # <---- UNKNOWN COUNTRY
while IFS="|" read -r NAME COUNTRY AGE; do 
    echo "NAME:    $NAME";
    echo "COUNTRY: $COUNTRY";
    echo "AGE:     $AGE";
done<<<"$DATA2"

OUTPUT:

NAME: JohnDoe
COUNTRY:
AGE: 30

It should work identically to this piece of code, where we are doing the exact same thing, just using \t as a separator instead of |

DATA1="Andreas  Sweden  27"
DATA2="JohnDoe      30"  # <---- THERE ARE TWO TABS HERE
while IFS=$'\t' read -r NAME COUNTRY AGE; do 
    echo "NAME:    $NAME";
    echo "COUNTRY: $COUNTRY";
    echo "AGE:     $AGE";
done<<<"$DATA2"

But it doesn't.

OUTPUT:

NAME: JohnDoe
COUNTRY: 30
AGE:

Bash, or read or IFS or some other part of the code is globbing together the whitespace when it isn't supposed to. Why is this happening, and how can I fix it?


Solution

  • bash is behaving exactly as it should. From the bash documentation:

    The shell treats each character of IFS as a delimiter, and splits the results of the other expansions into words on these characters. If IFS is unset, or its value is exactly <space><tab><newline>, the default, then sequences of <space>, <tab>, and <newline> at the beginning and end of the results of the previous expansions are ignored, and any sequence of IFS characters not at the beginning or end serves to delimit words. If IFS has a value other than the default, then sequences of the whitespace characters space and tab are ignored at the beginning and end of the word, as long as the whitespace character is in the value of IFS (an IFS whitespace character). Any character in IFS that is not IFS whitespace, along with any adjacent IFS whitespace characters, delimits a field. A sequence of IFS whitespace characters is also treated as a delimiter.

    To overcome this "feature", you could do something like the following:

    #!/bin/bash
    
    DATA1="Andreas  Sweden  27"
    DATA2="JohnDoe          30"  # <---- THERE ARE TWO TABS HERE
    
    echo "$DATA2" | sed 's/\t/;/g' |
    while IFS=';' read -r NAME COUNTRY AGE; do
        echo "NAME:    $NAME"
        echo "COUNTRY: $COUNTRY"
        echo "AGE:     $AGE"
    done