Search code examples
authenticationfiltersynchronizationcouchdbpouchdb

Filtered Sync between CouchDB and PouchDB


I am currently thinking about using CouchDB 2 and PouchDB 7 in my next app I want to write. Basically I will have a CouchDB in a central storage and web clients and mobile apps will start a PouchDB that thinks. Basically this works like a charm.

But... How do I do filtered sync between CouchDB and PouchDB if the filter should be done based on document ownership?

I know about the solutions with per user database. But my documents will have shared access by the creator of a documents and people he/she adds as reader or writer.

Any solutions in 2018 for this problem? Back in 2016 I was unable to solve this issue and dropped the app idea.


Solution

  • You should include in your documents the information that you require to restrict the access to the document, ownership, authorized users.

    Based on this information, there are two options for filtered replication definition between CouchDB and PouchDB (check filtering options).

    1. Based on JavaScript filter functions defined in CouchDB design documents. Filter functions allow you to implement your filtering logic that accept parameters provided during the request as URL parameters or the user that is authenticated in CouchDB via the req parameter.

      The main problem with this approach is that you will notice a performance degradation as long as your database grows. The filter is applied to every doc, even deleted ones, in the database in order to produce a result. So I do not recommend this filtering mechanism if you foresee that you will have a significant number of docs in the database. Here you have a sample of this kind of problems.

      A lite improvement over this performance problem is to write your filtering logic in Erlang, which is a bit more complex than the JS option, and during my tests I didn't manage to have a big gain with this.

    2. In CouchDB 2.x there is the option of perform filtered replication using selectors. Selectors can be indexed and are reported to be 10x faster than JS filters. Selectors are completely defined by the client and are not based on the authentication context in the database. This option scales much better than the previous one.

    In any case, filtering allows you to do some database segmentation during the replication process but it is not a security mechanism for document-level read permissions.

    Document write permissions can be achieved using validate document update functions.


    UPDATE I revisited this answer trying to offer more precise information about database filtering mechanisms. I've tested the performance of the different filtering approaches trying to confirm the answer statements.

    I loaded a database with 9000 docs and I performed time measurements of the _changes feed filtering using four techniques: JS filtering, Erlang filtering, Mango selectors filtering and Doc id filtering with the following results:

    • JS filtering of 9000 docs - 4.3 secs
    • Erlang filtering of 9000 docs - 2.3 secs
    • Mango selector filtering of 9000 docs - 0.48 secs
    • Doc ids filtering of 9000 docs - 0.01 secs

    The test confirms that the JS filtering is the worse option as it needs to evaluate filter condition in an external process which introduces and additional overhead. Erlang and Mango expressions are evaluated inside the filtering process which represents a real performance gain.

    In order to verify the impact of the number of docs over filtering, I created a database with 20.000 docs and I performed the same tests with the following results:

    • JS filtering of 20.000 docs - 10 secs
    • Erlang filtering of 20.000 docs - 5.45 secs
    • Mango selector filtering of 20.000 docs - 1.07 secs
    • Doc ids filtering of 20.000 docs - 0.01 secs

    JS, Erlang and Mango filtering the time increment is linear to the # of docs. No index are used for these filtering mechanisms. Doc ids filtering is constant as it is based on the _id index.