Search code examples
couchdbcouchdb-mango

Couchdb Mango Performance vs Map Reduce Views


I've just noticed that in the release notes of Couchdb 2.0, it is mentionned that Mango queries are recommended for new applications. It is also mentionned that apparently Mango indexes are from 2x to x10 faster than javascript queries which really surprised me, as such I have a number of questions :

  • Are Map/Reduce views being phased out ? I'm expecting the answer to be no since it seems to me that Mango does not cover all the use cases of Map/Reduce (the easiest example being Reduce itself), and the flexibility of this querying style seems to be more limited too. But m prefer to ask because of the recommendation :

We recommend all new apps start using Mango as a default.

  • We know that Map/Reduce views rely on B-trees, but I can't find any insight, in the doc or the mailing list regarding the magic behind Mango. Mango essentially is white magic for me at the minute. Yet I can tell that having an in-depth knowledge of how the javascript views are indexed behind the scenes was massively helpful to avoid pitfalls, naive implementations as well as to optimize performances. Does anyone have any insight on how Mango works ? Are the indexes B-trees too ? When are the indexes updated since there is no longer design documents ? Where do the performance gains come from ? (these gains are counter-intuitive to me, since in my understanding, the performance of javascript queries came from the precomputed nature of Map functions)

What I'm essentially after is on the one hand some insight regarding Mango and on the other hand, an overview of how Mango and Map/Reduce are supposed to live together in the 2.x era.


Solution

  • Answer from a core developer :

    Some good questions. I don't think Mango will ever replace Map/Reduce completely. It is an alternative querying tool. What is great about the Mango query syntax is that it is a lot easier to understand and get started. And we can use it in a lot of places outside of just querying for documents. It can be used for replication filtering and the changes feed. We hope to soon have support for validation doc updates as well.

    Underneath Mango is using erlang map/reduce. Which means it is creating a B-tree index just like map/reduce. What makes it faster is that it is using erlang/native functions to create the B-Tree instead of javascript. I wrote a blog post a long time ago about the internals of PouchDB-find [1] which is the mango syntax for PouchDB. It might help you understand a little more how the internals work. The key thing to understand is that there is a Map query part which uses the B-Tree and an in-memory filter. Ideally the less memory filtering you do the faster your query will be.

    I would say that Mango is very much a work in process but the basic ground work is done. There are definitely things we can improve on. I've seen it used quite a bit when developers start a new project because its quick and simple to do basic querying, like find by email address or find all users with the name "John Rambo".

    Hope that helps.

    [1] http://www.redcometlabs.com/blog/2015/12/1/a-look-under-the-covers-of-pouchdb-find