Search code examples
excelbashcsvassociative-arraycarriage-return

Reading CSV in Bash into a Dictionary/Associative array


I am trying to read a csv file into a bash associative array but am not getting the results I expect.

Using Bash 5.0.18

Bellum:fox3-api rocky$ bash --version
GNU bash, version 5.0.18(1)-release (x86_64-apple-darwin19.5.0)

Contents of foobar.csv

Bellum:scripts rocky$ cat ./foobar.csv
foo-1,bar-1
foo-2,bar-2
foo-3,bar-3

Contents of problem.sh

#!/usr/bin/env bash

declare -A descriptions
while IFS=, read name title; do
      echo "I got:$name|$title"
      descriptions[$name]=$title
done < foobar.csv

echo ${descriptions["foo-1"]}
echo ${descriptions["foo-2"]}
echo ${descriptions["foo-3"]}

Actual Output from problem.sh

Bellum:scripts rocky$ ./problem.sh
I got:foo-1|bar-1
I got:foo-2|bar-2

bar-2

Bellum:scripts rocky$

Desired output:

I got:foo-1|bar-1
I got:foo-2|bar-2
I got:foo-3|bar-3    
bar-1
bar-2
bar-3

Comment Requested Outputs

    Bellum:scripts rocky$ head -n 1 ./foobar.csv | hexdump -C
    00000000  ef bb bf 66 6f 6f 2d 31  2c 62 61 72 2d 31 0d 0a  |...foo-1,bar-1..|
    00000010
    Bellum:scripts rocky$ od -c foobar.csv
    0000000  357 273 277   f   o   o   -   1   ,   b   a   r   -   1  \r  \n
    0000020    f   o   o   -   2   ,   b   a   r   -   2  \r  \n   f   o   o
    0000040    -   3   ,   b   a   r   -   3
    0000050

Cyrus's dos2unix change

    #!/usr/bin/env bash
    
    declare -A descriptions
    dos2unix < foobar.csv | while IFS=, read name title; do
          echo "I got:$name|$title"
          descriptions[$name]=$title
    done
    
    echo ${descriptions["foo-1"]}
    echo ${descriptions["foo-2"]}
    echo ${descriptions["foo-3"]}

Output of Cyrus's dos2unix change

    Bellum:scripts rocky$ ./problem.sh
    I got:foo-1|bar-1
    I got:foo-2|bar-2
    
    
    
    
    Bellum:scripts rocky$

The csv file is made on a Mac by saving as csv from Microsoft Excel. Thanks in advance for any insights.

Hybrid Solution

For future people, this problem was actually two issues. The first was from saving my CSV file from a Microsoft Excel for Mac workbook. I Saved As... "CSV UTF-8" format (the first CSV file format listed in the drop down menu of Excel). This adds in additional bytes that messed up the read command in bash. Interestingly, these bytes won't show up in a cat command (see original post problem description). Saving the CSV instead from Excel as "Comma Separated Values" (much further down the drop down list of formats), got rid of this first problem.

Secondly, @Léa Gris and @glenn jackman pointed me in the right direction for modifiers to my script that helped with some newline and carriage return characters that were present in the Excel saved file.

Thanks, everyone. I spent a full day trying to figure this out. Lesson learned: I should have turned to Stackoverflow much sooner.


Solution

  • Here's why you don't get the output you expect:

        Bellum:scripts rocky$ od -c foobar.csv
        0000000  357 273 277   f   o   o   -   1   ,   b   a   r   -   1  \r  \n
        0000020    f   o   o   -   2   ,   b   a   r   -   2  \r  \n   f   o   o
        0000040    -   3   ,   b   a   r   -   3
        0000050
    
    1. the name on first line does not contain just "foo-1" -- there are extra characters in there.
      • They can be removed with "${name#$'\357\273\277'}"
    2. the last line does not end with a newline, so the while-read loop only iterates twice.
      • read returns non-zero if it can't read a whole line, even if it reads some characters.
      • since read returns "false", the while loop ends.
      • this can be worked around by using:
        while IFS=, read -r name title || [[ -n $title ]]; do ... 
        #............................. ^^^^^^^^^^^^^^^^^^ 
        
      • or, just fix the file.

    Result:

    BOM=$'\357\273\277'
    CR=$'\r'
    
    declare -A descriptions
    while IFS=, read name title || [[ $title ]]; do
      descriptions["${name#$BOM}"]=${title%$CR}
    done < foobar.csv
    
    declare -p descriptions
    echo "${descriptions["foo-1"]}"
    echo "${descriptions["foo-2"]}"
    echo "${descriptions["foo-3"]}"
    
    declare -A descriptions=([foo-1]="bar-1" [foo-2]="bar-2" [foo-3]="bar-3" )
    bar-1
    bar-2
    bar-3