Search code examples
mongodbmongodb-queryfull-text-searchtext-search

Mongodb text search exact phrase


I have a collection with following docs:

{
    "_id" : ObjectId("5ad609a2ac1a8b644180936a"),
    "content" : "Coffee and cakes..."
},
{
    "_id" : ObjectId("5ad609baac1a8b644180936b"),
    "content" : "coffee shop..."
}

Result of the text search query:

find({ $text: { $search: "\"coffee shop\" cakes" } })

returns only second document, but I am expecting both docs. What is the problem?


Solution

  • This ...

    find({ $text: { $search: "coffee shop cakes" } })
    

    ... will search for any document having a content attribute which contains any one of "coffee" or "shop" or "cake"

    But this ...

    find({ $text: { $search: "\"coffee shop\" cakes" } })
    

    ... will search for any document having a content attribute which contains the phrase "coffee shop".

    I think you are expecting both of the above behaviours when you submit a phrase ("coffee shop") and an extra search value ("cakes"). However, this is not how MongoDB treats a combination of phrase and additional terms.

    From the docs:

    If the $search string includes a phrase and individual terms, text search will only match the documents that include the phrase.

    Based on these docs the query "\"coffee shop\" cakes" will be evaluated as:

    "coffee shop" AND ("cakes" or "coffee" or "shop")
    

    This correctly matches only the second document.

    Note: the text index docs contradict this, according to those docs the query "\"coffee shop\" cakes" will be evaluated as: "coffee shop" OR "cakes" but the behaviour you are observing is consistent with the $text operator docs quoted above.

    Thanks to @RahulRaj for raising this issue with MongoDB, their response confirms that the docs are incorrect:

    As you correctly note, there is an inconsistency in the documentation between these two pages. We're tracking this fix to the documentation in DOCS-10382.

    https://docs.mongodb.com/manual/reference/operator/query/text/#phrases correctly describes the current implementation of this feature.