azure lucene tokenize azure-cognitive-search

How to get character matches in Azure Search index instead of substrings

I created an Azure index for my DocumentDB collection, and it seems to be working fine. The index has properties for a user account like FirstName, LastName, and Username. The problem is the default tokenizer seems to be tokenizing the Username field. While I want token matches for the first two fields, I'd like character matching for the usernames. Is there an easy way to achieve this through the Azure portal? If not, how can I achieve this?

Solution

Adding another answer based on your above comments. So basically in the best case, what you want to do is prefix, suffix and wildcard search. So if the username was user246392, you could find it by typing "use", "392" or even "er246". The prefix is easy, because you could search use* and it would find it.

Kendra Little did a really nice blog post on how to leverage RegEx with Azure Search, which can allow you to do the full wildcard part of your ask (i.e. search for "392").

If you wanted to do the suffix search, you can do a trick that is quite efficient where you create a new field that would be a custom analyzer that would index the words in opposite order. Here is an example of a index schema that would allow this (over suffixName field)

{   
"name":"people",
"fields": [
    { "name":"id", "type":"Edm.String", "key":true, "searchable":false },
    {"name": "suffixName", "type": "Edm.String", "searchable":true, "indexAnalyzer":"suffixIndexingAnalyzer", "searchAnalyzer":"reverseText"}
],
"analyzers": [
    {
        "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
        "name": "suffixIndexingAnalyzer",  
        "tokenizer": "keyword_v2",
        "tokenFilters": [
            "asciifolding",
            "lowercase",
            "reverse",
            "my_edgeNGramForSuffix"
        ],
        "charFilters": []
    },
    {
        "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
        "name": "reverseText",  
        "tokenizer": "classic",
        "tokenFilters": [
            "lowercase",
            "reverse"
        ],
        "charFilters": []
    }

],
"tokenFilters":[  
    {
        "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
        "name": "my_edgeNGramForSuffix",
        "minGram": 2,
        "maxGram": 25,
        "side": "front"
    }
]

}