I'm using pouchDb on an electron app. The data was stored in a postgres database before passing to pouchDb. On some cases it wasn't hard to figured out how to structure the data in a document fashion.
My main concern is regarding relations. For example:
I have the data type Projects and the Projects have many events. Right now I have a field called project_id on each event. So when I want to get the events for a project with ID 'project/1' I'll do
_db.allDocs({
include_docs: true,
startkey: 'event',
endkey: 'event\uffff'
}).then(function(response){
filtered = _.filter(response['rows'], function(row){
return row['doc']['project_id'] == 'project/1'
});
result = filtered.map(function(row){
return row['doc']
})
});
I've read that allDocs
is the most performant API, but, Is having a view more convenient on this case?
On the other hand, when I show a list with all the projects, each project needs to show the number of events it has. On this scenario looks like I would have to run allDocs again, with include_docs: false
in order to count the number of events the project has.
Does having a view improves this situation?
On the other hand I'm thinking on having an array with all the events Ids on the Project document so I can easily count how many event it has. In this case should I use allDocs? Is there a way of passing an array of Ids to allDocs? Or would it be better using a loop over that array and call get(id) for each id?
Is this other way more performant than the first one?
Thanks!
Good question! There are many ways to handle relationships in PouchDB. And like many NoSQL databases, each will give you a tradeoff of performance vs. convenience.
The system you describe is not terribly performant. Basically you are fetching every single event in the database (O(n)
) and then filtering in-memory. If you have many events, then n
will be large, meaning it will be very very slow.
You have a few options here. All of them are better than your current system:
map
function, you would emit()
the project _id
s for each event. This creates a secondary index on whatever you put as the key
in the emit()
function._id
s and running allDocs()
with startkey
and endkey
for each one. So it would do one allDocs()
to fetch the project, then a second allDocs()
to fetch the events for that project.new PouchDB('projects')
and new PouchDB('events')
(Roughly, these are listed in order of least performant to most performant.)
#1 is more performant than the system you describe, although it's still not terribly fast, because it requires creating a secondary index, and then after that will essentially do an allDocs()
on the secondary index database as well as on the original database (to fetch the linked docs). So basically you are running allDocs()
three times under the hood – one of which is on whatever you emitted as the key
, which it seems like you don't need, so it would just be wasted.
#2 is much better, because under the hood it runs two fast allDocs()
queries - one to fetch the project, and another to fetch the events. It also doesn't require creating a secondary index; it can use the free _id
index.
#3 also requires two allDocs()
calls. So why is it the fastest? Well, interestingly it's because of how IndexedDB orders read/write operations under the hood. Let's say you are writing to both 'projects'
and 'events'
. What IndexedDB will do is to serialize those two writes, because it can't be sure that the two aren't going to modify the same documents. (When it comes to reads, though, the two queries can run concurrently in either case – in Chrome, at least. I believe Firefox will actually serialize the reads.) So basically if you have two completely separate PouchDBs, representing two completely separate IndexedDBs, then both reads and writes can be done concurrently.
Of course, in the case of a parent-child relationship, you can't know the child IDs in advance, so you have to fetch the parent anyway and then fetch the children. So in that case, there is no performance difference between #2 and #3.
In your case, I would say the best choice is probably #2. It's a nice compromise between perf and convenience, especially since the relational-pouch
plugin already does the work for you.