I have a question about a peculiar behavior I noticed in my custom analyzer (as well as in the fr.microsoft analyzer). The below Analyze API tests are shown using the “fr.microsoft” analyzer, but I saw the same exact behavior when I use my “text_contains_search_custom_analyzer” custom analyzer (which makes sense as I base it off the fr.microsoft analyzer).
UAT reported that when they search for “femme” (singular) they expect documents with “femmes” (plural) to also be found. But when I tested with the Analyze API, it appears that the Azure Search service only tokenizes plural -> plural + singular, but when tokenizing singular, only singular tokens are used. See below for examples.
Is there a way I can allow a user to search for the singular version of a word, but still include the plural version of that word in the search results? Or will I need to use synonyms to overcome this issue?
Request with “femme” { "analyzer": "fr.microsoft", "text": "femme" }
Response from “femme” { "@odata.context": "https://EXAMPLESEARCHINSTANCE.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 5, "position": 0 } ] }
Request with “femmes” { "analyzer": "fr.microsoft", "text": "femmes" }
Response from “femmes” { "@odata.context": "https://EXAMPLESEARCHINSTANCE.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 } ] }
Just to add to yoape's response, the fr.microsoft analyzer reduces inflected words to their base form. In your case, the word femmes is reduced to its singular form femme. All cases that you described will work:
The key learning here is that the analyzer processes the documents but also query terms. Terms are normalized accounting for language specific rules.
I hope that explains it.