Search code examples
mongodbmongoimport

mongoimport: set type for all fields when importing CSV


I have multiple problems with importing a CSV with mongoimport that has a headerline.

Following is the case:

I have a big CSV file with the names of the fields in the first line. I know you can set this line to use as field names with: --headerline.

I want all field types to be strings, but mongoimport sets the types automatically to what it looks like.

IDs such as 0001 will be turn into 1, which can have bad side effects.

Unfortunately, there is (as far as i know) no way of setting them as string with a single command, but by naming each field and setting it type with

--columnsHaveTypes --fields "name.string(), ... "

When I did that, the next problem appeared. The headerline (with all field names) got imported as values in a separate document.

So basically, my questions are:

  • Is there a way of setting all field types as string using the --headerline command ?

  • Alternative, is there a way to ignore the first line ?


Solution

  • I found a solution, that I am comfortable with

    Basically, I wanted to use mongoimport within my Clojure Code to import a CSV file in the DB and do a lot of stuff with it automatically. Due to the above mentioned problems I had to find a workaround, to delete this wrong document.

    I did following to "solve" this problem:

    To set the types as I wanted, I wrote a function to read the first line, put it in a vector and then used String concatenation to set these as fields.

    Turning this: id,name,age,hometown,street

    into this: id.string(),name.string(),age.string() etc

    Then I used the values from the vector to identify the document with

       { name : "name"
    
        age : "age"
    
        etc : "etc" }
    

    and then deleted it with a simple remving.find() command.

    Hope this helps any dealing with the same kind of problem.