Search code examples
optimizationviewcouchdborganization

Couchdb views and many (thousands) document types


I'm studing CouchDB and I'm picturing a worst case scenario:

for each document type I need 3 view and this application can generate 10 thousands of document types.

With "document type" I mean the structure of the document.

After insertion of a new document, couchdb make 3*10K calls to view functions searching for right document type.

Is this true? Is there a smart solution than make a database for each doc type?

Document example (assume that none documents have the same structure, in this example data is under different keys):

[
     {
       "_id":"1251888780.0",
       "_rev":"1-582726400f3c9437259adef7888cbac0"
       "type":'sensorX',
       "value":{"ValueA":"123"}
     },
     {
       "_id":"1251888780.0",
       "_rev":"1-37259adef7888cbac06400f3c9458272"
       "type":'sensorY',
       "value":{"valueB":"456"}
     },
     {
       "_id":"1251888780.0",
       "_rev":"1-6400f3c945827237259adef7888cbac0"
       "type":'sensorZ',
       "value":{"valueC":"789"}
     },
   ]

Views example (in this example only one per doc type)

  "views":
  {
    "sensorX": {
      "map": "function(doc) { if (doc.type == 'sensorX')  emit(null, doc.valueA) }"
    },
    "sensorY": {
      "map": "function(doc) { if (doc.type == 'sensorY')  emit(null, doc.valueB) }"
    },
    "sensorZ": {
      "map": "function(doc) { if (doc.type == 'sensorZ')  emit(null, doc.valueC) }"
    },
  }

Solution

  • The results of the map() function in CouchDB is cached the first time you request the view for each new document. Let me explain with a quick illustration.

    • You insert 100 documents to CouchDB

    • You request the view. Now the 100 documents have the map() function run against them and the results cached.

    • You request the view again. The data is read from the indexed view data, no documents have to be re-mapped.

    • You insert 50 more documents

    • You request the view. The 50 new documents are mapped and merged into the index with the old 100 documents.

    • You request the view again. The data is read from the indexed view data, no documents have to be re-mapped.

    I hope that makes sense. If you're concerned about a big load being generated when a user requests a view and lots of new documents have been added you could look at having your import process call the view (to re-map the new documents) and have the user request for the view include stale=ok.

    The CouchDB book is a really good resource for information on CouchDB.