I'm trying to implement diacritics-insensitive search with MongoDB C# driver, e.g. when I search for "Joao", it should return all the results containing both "João" and "Joao" (if any).
The following command works on MongoDBCompass, i.e. if I run it against my MongoDB collection (currently only containing a document with "João", none with "Joao"), it will return the correct document:
db.profiles.find({FirstName:"Joao"}).collation({locale:"pt", strength: 1})
However, when I try to transpose it to C#, it won't work (e. g. doesn't return any result if I search for "Joao", only if I search for "João"):
private IFindFluent<ProfessionalProfile, ProfessionalProfile> BuildProfessionalLocationFilter(BaseQuery criteria)
{
FilterDefinition<ProfessionalProfile> filter = FilterDefinition<ProfessionalProfile>.Empty;
if (!string.IsNullOrEmpty(criteria.SearchWords))
{
var searchWords = criteria.SearchWords.ToLower().Trim().Split(' ', StringSplitOptions.RemoveEmptyEntries).ToList();
FilterDefinition<ProfessionalProfile> searchWordFilter = FilterDefinition<ProfessionalProfile>.Empty;
foreach (string searchWord in searchWords)
{
var newFilter = Builders<ProfessionalProfile>.Filter.And(Builders<ProfessionalProfile>.Filter.Where(profile =>
profile.FirstName.ToLower().Contains(searchWord) ||
profile.LastName.ToLower().Contains(searchWord) ||
profile.Description.ToLower().Contains(searchWord) ||
profile.Email.ToLower().Contains(searchWord) ||
profile.Facebook.ToLower().Contains(searchWord) ||
profile.Instagram.ToLower().Contains(searchWord) ||
profile.LinkedIn.ToLower().Contains(searchWord) ||
profile.Locations.Any(location =>
location.Name.ToLower().Contains(searchWord) ||
location.District.Name.ToLower().Contains(searchWord) ||
location.Council.Name.ToLower().Contains(searchWord)) &&
!profile.IsDeleted &&
profile.WizardStep == 6));
if (searchWordFilter == FilterDefinition<ProfessionalProfile>.Empty)
searchWordFilter = newFilter;
else
searchWordFilter = searchWordFilter | newFilter;
}
filter = filter & searchWordFilter;
}
IFindFluent<ProfessionalProfile, ProfessionalProfile> findFluent = _professionalCollection.Find(filter);
findFluent.Options.Collation = new Collation("pt", strength: CollationStrength.Primary);
//IFindFluent<ProfessionalProfile, ProfessionalProfile> findFluent = _professionalCollection.Find(filter, new FindOptions() { Collation = new Collation("pt", strength: CollationStrength.Primary) } );
return findFluent;
}
Please note that I've also tried the commented line above, with the same (predictable) results.
What could be missing? Thanks.
EDIT:
As asked in the comment below by @Đĵ ΝιΓΞΗΛψΚ, I'm adding some information about the profile collection collation.
Mongo shell command:
db.getCollectionInfos({ name: 'profiles' })
Ouput:
[
{
name: 'profiles',
type: 'collection',
options: {},
info: {
readOnly: false,
uuid: UUID("4606f027-03b1-45e8-bf7f-6c99461db042")
},
idIndex: { v: 2, key: [Object], name: '_id_', ns: 'dev.profiles' }
}
]
it took a bit of investigating since documentation on the issue is scarse, but i think i figured out what's happening.
profile.FirstName.ToLower().Contains(searchWord)
gets translated by the driver to a $regex
query.
from what i can see, the regex search in mongodb is not collation aware. so you can't use regex functionality to do diacritic insensitive searches.
however, the solution to your requirement is to create a Text Index
containing all of the fields you want to search in and utilize that index to do a diacritic & case insensitive search for your search words. it will also be the most efficient way to achieve your requirement.
the one limitation of using a text index is that it won't let you search for partial matches of words such as Jo
. mongodb fulltext search only works on complete words unfortunately.
here's a test program (using mongodb.entities library for brevity):
using MongoDB.Driver;
using MongoDB.Entities;
using System.Threading.Tasks;
namespace TestApplication
{
public class ProfessionalProfile : Entity
{
public string FirstName { get; set; }
public string LastName { get; set; }
public bool IsDeleted { get; set; }
public int WizardStep { get; set; }
}
public static class Program
{
private static async Task Main()
{
await DB.InitAsync("test", "localhost");
await DB.Index<ProfessionalProfile>()
.Key(p => p.FirstName, KeyType.Text)
.Key(p => p.LastName, KeyType.Text)
.CreateAsync();
await new[] {
new ProfessionalProfile{ FirstName = "João", LastName = "Balboa", IsDeleted = false, WizardStep = 6},
new ProfessionalProfile{ FirstName = "Joao", LastName = "Balboa", IsDeleted = false, WizardStep = 6},
}.SaveAsync();
IAggregateFluent<ProfessionalProfile> aggregateFluent =
DB.FluentTextSearch<ProfessionalProfile>(
searchType: Search.Full,
searchTerm: "Joao Mary John",
caseSensitive: false,
diacriticSensitive: false)
.Match(p => !p.IsDeleted && p.WizardStep == 6);
var result = await aggregateFluent.ToListAsync();
}
}
}
i'm sure you won't have much trouble translating that to mongo driver code.