Search code examples
mongodbcsvmongoimport

CSV Mongoimport throws bare " in non quoted fields


I have a csv file which contains ["] if the value for that attribute is null. For example:

a,b,c
1,2,3
4,[""],6
7,8,9

I have two RHEL systems with different Mongo versions: 2.4.1 and 3.4 When I run this command:

time mongoimport --host hostname:27017 --db test --collection abc --file example.csv --type csv

This throws an error in version 3.4:

 bare " in non-quoted-field

It works fine in 2.4.1 though!!! I need it to work on 3.4. Can anyone suggest a solution? the CSV files I have are huge and I dont want to alter the data.


Solution

  • MongoImport 3.4 documentation says :

    The csv parser accepts that data that complies with RFC RFC 4180. As a result, backslashes are not a valid escape character. If you use double-quotes to enclose fields in the CSV data, you must escape internal double-quote marks by prepending another double-quote.

    Hence, in your case, i.e. [""], it considered the first quote to be escape character due to which it is left with a bare " and due to which failed. To escape the double-quote, you can use it like [""""]. Here each double quote escapes the next double quote

    To fix this, you have the following two options :

    1. Open this csv in MS Excel. It will reformat the whole file correctly.
    2. If [""] just denotes an empty field, you may directly replace this pattern with an empty string. Mongoimport will not create a field for the empty values in the corresponding doc.

    EDIT :

    As discussed, it is used to represent a null value, then you can totally replace this string using sed like

    sed 's/\[""\]//g' FILE-NAME