Search code examples
javacouchdbektorp

How can I change the structure of all documents in a CouchDB database?


I have to change the structure of all existing documents in one of my CouchDB databases that contain a certain field. Right now, the field is just a simple String, for example:

{
  // some other fields
  "parameters": {
     "typeId": "something",
     "otherField": "dont_care"
  }
}

In this example, the field I'm interested is "typeId". I want to make it an array of Strings because the requirements for this was modified :( But I obviously need to keep the current value of the field in all documents! So, from the example above, the result would be:

{
  // some other fields
  "parameters": {
     "typeId": [ "something" ] // now we can have more items here
     "otherField": "dont_care"
  }
}

Any ideas how this can be achieved??

Just in case this helps: my Java web-application communicates with CouchDB through the Ektorp library.


Solution

  • I would say first write a function (or method, or class) that converts old-style documents into new-style documents and also correctly handles irrelevant documents (such as a design document) if necessary. Write some unit tests until you are confident about this code.

    The next step is basically a loop of finding old-style documents and updating them to become new-style documents, using your modification routine.

    If you have a small data set, you can simply query /_all_docs?include_docs=true and work on your entire data set in one batch. If you have a larger data set, perhaps write a view which will identify old-style documents

    function(doc) {
      // map function for "to_do" view
      if(doc.parameters && typeof doc.parameters == "string")
        emit(doc._id, doc)
    }
    

    This view will show you all old-style documents to do. To grab 50 more documents to convert, GET /my_db/_design/converter/_view/to_do?limit=50. Each row's "value" field will be a complete copy of the document, so you can run it through your converter function immediately.

    Once you convert a document, you can either POST it back to the database, or build up a batch and use _bulk_docs to do the same. (Bulk docs is the same thing, just a little faster.) As each document is stored, it will disappear from the to_do view. (If you get a 409 Conflict error, just ignore it.) Re-run this procedure until there are 0 rows in to_do and you're done!

    You can judge from your situation how careful you need to be. If this is production data, you had better write good unit tests! If it is a development environment, just go for it!

    A final trick is to create a new, empty database and replicate your main database to it. Now you have a duplicate sandbox to try your ideas. You can delete and re-replicate your sandbox until you are happy with your results.