Search code examples
azurespecial-charactersmsdnazure-cognitive-searchanalyzer

Azure Search Autocomplete with Escape Special Characters


I am having issues with including special characters such as -, @, # and others in the response of autocomplete.

I am using .Net Core with C#, and Microsoft.Azure.Search package.

I am new to Azure Search, so please be generous to provide me with detailed explanation with some guidance.

So far, I have created an index with the field a suggester as below.

private async Task StartIndexAsync(bool resetIndexer = true)
{
    await CreateIndexAsync(new[]{
            new Field(nameof(ProjectSearchModel.Id),                      DataType.String)     { IsKey = true,  IsSearchable = false, IsFilterable = false, IsSortable = false, IsFacetable = false, IsRetrievable = true},
            new Field(nameof(ProjectSearchModel.Name),                    DataType.String)     { IsKey = false,  IsSearchable = false, IsFilterable = false, IsSortable = false, IsFacetable = false, IsRetrievable = true},
            new Field(nameof(ProjectSearchModel.Number),                  DataType.String)     { IsKey = false,  IsSearchable = false, IsFilterable = false, IsSortable = false, IsFacetable = false, IsRetrievable = true}
            },
        new[] {
            nameof(ProjectSearchModel.Name),
            nameof(ProjectSearchModel.Number),
        });

    await CreateDatasourceAsync();
    await StartIndexerAsync(resetIndexer);
}
internal async Task CreateIndexAsync(string indexName, IList<Field> mapFields, IList<string> sugFields)
{
    // Create the Azure Search index based on the included schema
    try
    {
        var definition = new Index()
        {
            Name = indexName,
            Fields = mapFields,
            Suggesters = new List<Suggester>() {new Suggester()
            {
                Name = "sg",
                SourceFields = sugFields,
            }}
        };

        await _searchClient.Indexes.CreateOrUpdateAsync(definition);
    }
    catch (Exception ex)
    {
        _logger.LogError("Error creating index: {0}\r\n", ex.Message);
    }
}

With this index setup, I am calling autocomplete with the function below.

public override async Task<AutocompleteResult> AutocompleteAsync(int take, string text)
{
    // Setup the suggest parameters.
    var parameters = new AutocompleteParameters()
    {
        SearchFields = new [] { "Name", "Number"},
        AutocompleteMode = AutocompleteMode.TwoTerms,
        UseFuzzyMatching = true,
        Top = take
    };
    var completeResult = await base.AutocompleteAsync(parameters, text);
    return completeResult;
}

My expected result would be [email protected] for Name field when I pass pyh in text. However, the actual result is just pyh2982 gmail.com, with missing @ inbetween.

I have researched a bit about Analyzer, but I am confused as to what analyzer I should be choosing.

Any help is appreciated!! Thanks!


Solution

  • Analyzer is a piece of code responsible to tokenise and index your content. The standard analyzer transforms the text to lowercase, and break on every stop word. As far as I know, it should index your email as one single piece. You can do a test and perform an autocomplete without '@' symbol and '-'. For example: pyh2982 gmail com and check if it works for you.

    PS: If you are using Lucene mode (queryType=full), then you should escape special chars. Please check: https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax