Search code examples
node.jsmongodbmongoosefull-text-indexing

Text search whitespace escape


I'm using nodeJs Mongoose to perform text search;

var mongoose = require('mongoose');
var config = require('../config');
var mongoosePaginate = require('mongoose-paginate'); 
var poiSchema = mongoose.Schema({
    city:String,
    cap:String,
    country:String,
    address: String,
    description: String,
    latitude: Number,
    longitude: Number,
    title: String,
    url: String,
    images:Array,
    freeText:String,
    owner:String,
});
poiSchema.index({'$**': 'text'});

poiSchema.plugin(mongoosePaginate);
mongoose.Promise = global.Promise;
mongoose.connect(config.database);
module.exports = mongoose.model('Poi', poiSchema);

As you can see here

poiSchema.index({'$**': 'text'});

I create a text index on every field inside my schema.

When I try to perform a text search, I develop this code:

var term = "a search term";

var query = {'$text':{'$search': term}};
Poi.paginate(query, {}, function(err, pois) {
    if(!pois){
        pois = {
            docs:[],
            total:0
        };
    }
    res.json({search:pois.docs,total:pois.total});
});

Unfortunately, when I use whitespace inside term search, it will fetch all documents inside the collection that matches every single field inside term search split by whitespace.

I imagine that text index has as tokenizer whitespace;

I need to know how to escape whitespace in order to search every field that has the entire term search without splitting it.

I tried replacing whitespace with \\ but nothing changes.

Could please someone help me?


Solution

  • MongoDB allows text search queries on string content with support for case insensitivity, delimiters, stop words and stemming. The terms in your search string are, by default, OR'ed. From the docs, the $search string is ...

    A string of terms that MongoDB parses and uses to query the text index. MongoDB performs a logical OR search of the terms unless specified as a phrase.

    So, if at least one term in your $search string matches then MongoDB returns that document and MongoDB searches using all terms (where a term is a string separated by whitespace).

    You can change this behaviour by specifying a phrase, you do this by enclosing multiple terms in quotes. In your question, I think you want to search for the exact phrase: a search term so just enclose that phrase in escaped string quotes.

    Here are some examples:

    • Given these documents:

      { "_id" : ..., "name" : "search" }
      { "_id" : ..., "name" : "term" }
      { "_id" : ..., "name" : "a search term" }
      
    • The following queries will return ...

      // returns the third document because that is the only
      // document which contains the phrase: 'a search term'
      db.collection.find({ $text: { $search: "\"a search term\"" } })
      
      // returns all three documents because each document contains
      // at least one of the 3 terms in this search string
      db.collection.find({ $text: { $search: "a search term" } })
      

    So, in summary you "escape whitespace" by enclosing your set of search terms in escaped string quotes ... instead of "a search term" use "\"a search term\"".