Search code examples
mapreducecouchdb

Why do CouchDB reduce functions receive 'keys' as an argument


With a CouchDB reduce function:

function(keys, values, rereduce) {
  // ...
}

That gets called like this:

reduce(  [[key1,id1], [key2,id2], [key3,id3]],    [value1,value2,value3],   false   ) 

Question 1

What is the reason for passing keys to the reduce function? I have only written relatively simple CouchDB views with reduce functions and would like to know what the use case is for receiving a list of [key1, docid], [key2, docid], etc is.

Also. is there ever a time when key1 != key2 != keyX when a reduce function executes?

Question 2

CouchDB's implementation of MapReduce allows for rereduce=true, in which case the reduce function is called like this:

reduce(null,  [intermediate1,intermediate2,intermediate3],  true)

Where the keys argument is null (unlike when rereduce=false). Why would there not be a use case for a keys argument in this case if there was a use for when rereduce=false?


Solution

  • What is the use case of keys argument when rereduce = true?

    There isn't one. That's why the keys argument is null in this case.

    From the documentation (emphasis added):

    Reduce and Rereduce Functions

    redfun(keys, values[, rereduce])

    Arguments:

    • keys – Array of pairs of key-docid for related map function results. Always null if rereduce is running (has true value).
    • values – Array of map function result values.
    • rereduce – Boolean flag to indicate a rereduce run.

    Perhaps what you're meaning to ask is: Why is the same function used for both reduce and rereduce? I expect there's some history involved, but I can also imagine that it's because it's quite common that the same logic can be used for both functions, and by not having separate function definitions duplication can be reduced. Suppose a simple sum reduce function:

    function(keys, values) {
        return sum(values);
    }
    

    Here both keys and rereduce can be ignored entirely. Many other (re)reduce functions follow the same pattern. If two functions had to be used, then this identical function would have to be specified twice.


    In response to the additional question in comments:

    what use cases exist for the keys argument when rereduce=false?

    Remember, keys and values can be anything, based on the map function. A common pattern is to emit([foo,bar,baz],null). That is to say, the value may be null, if all the data you care about is already present in the key. In such a case, any reduce function more complex than a simple sum would require use of the keys.

    Further, for grouping operations, using the keys makes sense. Consider a map function with emit(doc.countryCode, ... ), a possible (incomplete) reduce function:

    function(keys, values, rereduce) {
        const sums = {};
        if (!rereduce) {
            keys.forEach((key) => ++sums[key]);
        }
        return sums;
    }
    

    Then given documents:

    • {"countryCode": "us", ...}
    • {"countryCode": "us", ...}
    • {"countryCode": "br", ...}

    You'd get emitted values (from the map function) of:

    • ["us", ...]
    • ["br", ...]

    You'd a reduced result of:

    {"us": 2, "br": 1}