Search code examples
sqlarraysbashvariablesquoting

How to assign a string column to variable and have it quoted in output in Bash script


In this post, I had asked a question as to how I can have my script read a CSV header and auto-quote string values; i.e. automatically wrap those columns whose data type is string and would need "" in order to be inserted into a SQL table. Apparently, this would be more than what Bash can handle?

At any rate, I'd welcome any help regarding how I can make the following script work: here, I have basically the same script but I tried to manually define separate columns and assign them to variables with col3 being a string column, hence quoted. Needless to say, it doesn't do what I want it to do (i.e. give me quoted values for all the string data under col3). Thanks!

#!/bin/bash

echo Path to to-be-imported CSV:
read csv_file
echo Table name to import into:
read table

echo "INSERT INTO $table VALUES" > SQL_INSERT_$table.txt
while read col1 col2 col3 col4
do
 echo "($col1 $col2 "$col3" $col4),"
done < <(tail -n +2 $csv_file) >> SQL_INSERT_$table.txt && sed -i '' '$ s/.$/;/' SQL_INSERT_$table.txt

Solution

  • Given the following csv file:

    /tmp/csv:

    Year,Make,Model,Description,Price
    1997,Ford,E350,moon,-3000.00
    1997,Ford,E350,moon,3000.00
    1999,Chevy,Venture Extended Edition,,4900.00
    1999,Chevy,Venture Extended Edition Very Large,,5000.00
    

    And the code:

    awk -F, 'OFS=FS {for (i=1;i<=NF;i++) {if (match($i, /^[0-9.-]+$/)==0) {printf "\"" $i "\""} else {printf $i}; if (i<NF) printf OFS}; printf "\n"}' /tmp/csv
    

    It outputs:

    All the string types of values are quoted.

    "Year","Make","Model","Description","Price"
    1997,"Ford","E350","moon",3000.00
    1999,"Chevy","Venture Extended Edition","",4900.00
    1999,"Chevy","Venture Extended Edition Very Large","",5000.00
    

    The code is based on these assumptions:

    • There are no literal delimiters(the comma , here) inside the field values.
    • There are no literal newlines inside the field values.
    • There are no null values for the integer or the decimal fields.

    If the assumptions don't stand for your data, you may need a CSV parser like this. And the exceptional field values should be well quoted, otherwise, even a parser won't know what to do.

    BTW: Unless your data is very simple and the row number is a few, this is not the recommended way to manipulate the data. Many DBMS has an import/load utility that can be used in a program to do such a job.