Search code examples
viewnosqlcouchdbmapreducedocument-oriented-db

Is there anything wrong with creating Couch DB views with null values?


I've been doing a fair amount of work with Couch DB in my spare time recently and really enjoy using it. I find it to be much more flexible than using a relational database, but it's not without it's disadvantages.

One big disadvantage is the lack of dynamic queries / view generation... So you have to do a fair amount of work in planning and justifying your views, as you can't put that logic into your application code as you might do with SQL.

For example, I wrote a login scheme based on a JSON document template that looked a little bit like this:

{ 
   "_id": "blah",
   "type": "user",
   "name": "Bob",
   "email": "[email protected]",
   "password": "blah",
}

To prevent the creation of duplicate accounts, I wrote a very basic view to generate a list of user names to lookup as keys:

emit(doc.name, null) 

This seemed reasonably efficient to me. I think it's way better than dragging out an entire list of documents (or even just a reduced number of fields for each document). So I did exactly the same thing to generate a list of email addresses:

emit(doc.email, null)

Can you see where I'm going with this question?

In a relational database (with SQL) one would simply make two queries against the same table. Would this technique (of equating a view to the product of an SQL query) be in some way analogous?

Then there's the performance / efficiency issue... Should those two views really be just one? Or is the use of a Couch DB view with keys and no associated value an effective practice? Considering the example above, both of those views would have uses outside of a login scheme... If I ever need to generate a list of user names, I can retrieve them without an additional overhead.

What do you think?


Solution

  • First, you certainly can put the view logic into your application code - all you need is an appropriate build or deploy system that extracts the views from the application and adds them to a design document. What is missing is the ability to generate new queries on the fly.

    Your emit(doc.field,null) approach certainly isn't surprising or unusual. In fact, it is the usual pattern for "find document by field" queries, where the document is extracted using include_docs=true. There is also no need to mix the two views into one, the only performance-related decision is whether the two views should be placed in the same design document: all views in a design document are updated when any of them is accessed.

    Of course, your approach does not actually guarantee that the e-mails are unique, even if your application tries really hard. Imagine the following circumstances with two client applications A and B:

    A: queries view, determines that `[email protected]` does not exist.
    B: queries view, determines that `[email protected]` does not exist.
    A: creates account with `[email protected]`
    B: creates account with `[email protected]`
    

    This is a rare occurrence, but nonetheless possible. A better approach is to keep documents that use the email address as the key, because access to single documents is transactional (it's impossible to create two documents with the same key). Typical example:

    {
      _id: "[email protected]",
      type: "email"
      user: "000000001"
    }
    
    {
      _id: "000000001",
      type: "user", 
      email: "[email protected]",
      firstname: "Test", 
      ...
    }
    

    EDIT: a reservation pattern only works if two clients attempting to create an account for a given e-mail will reliably try to access the same document. If you randomly generate a new identifier, then client A will create and reserve document XXXX while client B will create and reserve document YYYY, and you will end up with two different documents that have the same e-mail.

    Again, the only way to perform a transactional "check if it exists, create if it does not" operation is to have all clients alter a single document.