Search code examples
mongodbmongoimport

Mongoimport csv files with string _id and upsert


I'm trying to use mongoimport to upsert data with string values in _id. Since the ids look like integers (even though they're in quotes), mongoimport treats them as integers and creates new records instead of upserting the existing records.

Command I'm running:

mongoimport --host localhost --db database --collection my_collection --type csv --file mydata.csv --headerline --upsert

Example data in mydata.csv:

{ "_id" : "0364", someField: "value" }

The result would be for mongo to insert a record like this: { "_id" : 364, someField: "value" } instead of updating the record with _id "0364".

Does anyone know how to make it treat the _id as strings?

Things that don't work:

  • Surrounding the data with double double quotes ""0364"", double and single quotes "'0364'" or '"0364"'
  • Appending empty string to value: { "_id" : "0364" + "", someField: "value" }

Solution

  • Unfortunately there is not now a way to force number-like strings to be interpreted as strings:

    https://jira.mongodb.org/browse/SERVER-3731

    You could write a script in Python or some other language with which you're comfortable, along the lines of:

    import csv, pymongo
    
    connection = pymongo.Connection()
    collection = connection.mydatabase.mycollection
    reader = csv.DictReader(open('myfile.csv'))
    for line in reader:
        print '_id', line['_id']
        upsert_fields = {
            '_id': line['_id'],
            'my_other_upsert_field': line['my_other_upsert_field']}
    
        collection.update(upsert_fields, line, upsert=True, safe=True)