Search code examples
elasticsearchnestelasticsearch-net

Search with Russian text analyzer does not work


I have very simple model for ElasticSearch:

[ElasticsearchType(RelationName = "example")]
public class ElasticModel
{
    [Text(Name = "description", Analyzer = "Russian", Index = true, SearchAnalyzer = "Russian")]
    public string Description { get; set; }
}

then I nitialize my index by the next line:

protected ICreateIndexRequest ConfigureIndex(CreateIndexDescriptor indexDescriptor,
            Func<IndexSettingsDescriptor, IPromise<IIndexSettings>> selectorOfIndexSettings)
        {
            ICreateIndexRequest returnValue;

            returnValue = indexDescriptor.Settings(selectorOfIndexSettings);
            return returnValue;
        }

    await _client.Indices.CreateAsync(completeIndexName, indexDescriptor => ConfigureIndex(indexDescriptor, selector));

then I initialize my model by the next value and try to search:

var document = new ElasticModel()
                        {
                            Description = "В Москве все выходные будут дожди"
                        };

                        var responseDoc = await await _client.IndexAsync(new IndexRequest<T>(document, completeIndexName))

var responseSearch = await _client.SearchAsync<ElasticModel>(s => s.Index(completeIndexName)
                            .Query(q => q.QueryString(c => c
                                                            .Query("выходной")
                            )));

but result is empty. When I make next request to my Elasticsearch server:

POST {{ElasticSearchAddress}}/_analyze
{
  "analyzer": "russian",
  "text": "В Москве все выходные будут дожди"
}

I see expected result:

{
    "tokens": [
        {
            "token": "москв",
            "start_offset": 2,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "выходн",
            "start_offset": 13,
            "end_offset": 21,
            "type": "<ALPHANUM>",
            "position": 3
        },
        {
            "token": "будут",
            "start_offset": 22,
            "end_offset": 27,
            "type": "<ALPHANUM>",
            "position": 4
        },
        {
            "token": "дожд",
            "start_offset": 28,
            "end_offset": 33,
            "type": "<ALPHANUM>",
            "position": 5
        }
    ]
}

Can anybody explain me, why my search from C#-code does not use Russian analyzer and does not return me expected result?

UPDATE:

Request to /elastictest100/_search with body:

{
  "query": {
    "multi_match" : {
      "query":    "выходные будут", 
      "fields": [ "description" ],
      "analyzer": "russian"
    }
  }
}

return me:

{
    "took": 2,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 0.13353139,
        "hits": [
            {
                "_index": "mediadev-elastictest100",
                "_type": "_doc",
                "_id": "G2FzRnMBhdWoY2X4fmQo",
                "_score": 0.13353139,
                "_source": {
                    "description": "В Москве все выходные будут дожди"
                }
            },
            {
                "_index": "mediadev-elastictest100",
                "_type": "_doc",
                "_id": "HGGLRnMBhdWoY2X4AGSV",
                "_score": 0.13353139,
                "_source": {
                    "description": "В Москве все выходные будут дожди"
                }
            },
            {
                "_index": "mediadev-elastictest100",
                "_type": "_doc",
                "_id": "HWGMRnMBhdWoY2X4tGSY",
                "_score": 0.13353139,
                "_source": {
                    "description": "В Москве все выходные будут дожди"
                }
            }
        ]
    }
}

with body:

{
  "query": {
    "multi_match" : {
      "query":    "выходной будет", 
      "fields": [ "description" ],
      "analyzer": "russian"
    }
  }
}

return me:

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 0,
            "relation": "eq"
        },
        "max_score": null,
        "hits": []
    }
}

Solution

  • I am not familier with Nest code, but can give you few pointers to debug the issue.

    1. Try to print the JSON of your final search query so that you can easily test it using the REST search end point to compare if you are generating proper query or not.
    2. Match queries uses the same analyzer which is used at index time but term queries are not analyzed which causes this kind of issues, and at the end for a search result to come, it should match the index time tokens to search time tokens.

    Easiest to check the search JSON and hit directly against your index using ES REST endpoint to see the root cause.