I'm playing with Mango queries on a CouchDB 2.0 instance, through the fantastic pouchdb-find.
A few times I got the dreaded no matching index found, create an index to optimize query time
warning even though I was using indexed fields.
Just now I got it when selecting "type": {"$in": ["a", "b"]}
or the equivalent "$or": [{"type": "a"}, {"type": "b"}]
, even though an index on type
exists.
Googling (cloudant query docs, pouchdb-find docs, SO question) didn't help, and in the latter @nlawson says that some predicates ($ne
in the aforementioned question, but maybe my $in
/ $or
fall into the same basket?) "currently do not use any index".
Thanks!
Answering my question: no, $in
/or
queries cannot use indices. I asked the question in this user@couchdb mailing list thread, where Garren Smith answered and pointed to Understanding Mango View-Based Indexes vs. Search-Based Indexes and A look under the covers of PouchDB-find. To quote Garren,
The reason that adding
"_id": {"$gt": 0}
works is because pouchdb-find/mango fetches all the docs using the_all_docs
index and then processes the$in
operator in memory.If you have a large database this will hurt. But you can use a better value than
0
to reduce the number of documents that need to be sorted in memory, which is a good thing.
So, careful, "_id": {"$gt": 0}
is by no means a way to use an index (that's the impression I got when @markwatsonatx suggested it), it's only a way to suppress pouchdb-find's warning, by telling it "I know this won't fit a map/reduce, I'll be running in-memory operations on allDocs, and I'm aware of the perf. consequences". Also,
The warning is just to help anyone new to using Mango that what they are doing isn't the best way on a large database but will be fine on a small database. It's a fine way to experiment but once you start noticing performance issues, creating an index is the way forward.
I'll complement with a little benchmark I made, comparing different approaches to fetch {10, 100, 1000, 10000} "cases" (using selectors on an indexed field) from a db containing [10000 cases, 100000 noise documents]
|number of cases fetched|10 |100 |1000 |10000 |
|-----------------------|------|-------|--------|------|
|$in |2452ms|2539ms |2474ms |5032ms|
|$in + $gt |905ms |784ms |1014ms |3805ms|
|$in + $gt + $lt |5ms |13ms |100ms |3854ms|
|$or |2638ms|11763ms|101279ms|- |