Search code examples
indexinglockingcouchdbtime-complexityb-tree

What data is actually stored in a B-tree database in CouchDB?


I'm wondering what is actually stored in a CouchDB database B-tree? The CouchDB: The Definitive Guide tells that a database B-tree is used for append-only operations and that a database is stored in a single B-tree (besides per-view B-trees).

So I guess the data items that are appended to the database file are revisions of documents, not the whole documents:

            +---------|### ...  
            |           |
   +------|###|------+     ... ---+
   |        |        |            |
+------+ +------+ +------+     +------+
| doc1 | | doc2 | | doc1 | ... | doc1 |
| rev1 | | rev1 | | rev2 |     | rev7 |
+------+ +------+ +------+     +------+

Is it true?

If it is true, then how the current revision of a document is determined based on such a B-tree?

Doesn't it mean, that CouchDB needs a separate "view" database for indexing current revisions of documents to preserve O(log n) access? Wouldn't it lead to race conditions while building such an index? (as far as I know, CouchDB uses no write locks).


Solution

  • The database file on disk is append-only; however the B-tree is conceptually modified in-place. When you update a document,

    1. Its leaf node is written (via append to the DB file)
    2. Its parent node is re-written to reference the new leaf (via append of course)
    3. Repeat step 2 until you update the root node

    When the root node is written, that is effectively when the newer revision is "committed." To find a document, you start at the end of the file, get the root node, and work down to your doc id. The latest revision will always be accessible this way.