Search code examples
c#elasticsearchnest

ElasticSearch match on list of int and string


I am fairly new to ElasticSearch and I am trying to make a query for our category page, where every product returned by ES, is in the category. For some reason, it includes products outside of the category, and I cannot seem to figure out why.

Product is a basic product, containing a list of category ids (the product can be in multiple categories). Apart from matching on categoryId, it should search in product name and the long description of the variants.

public IReadOnlyCollection<Product> GetByCategory(string value, int take, int categoryId)
    {
        value = string.Format("*{0}*", value);

        var query = new SearchDescriptor<Product>()
            .Index(this.index)
            .Query(q => q
              .Bool(b => b
                .Must(s => s
                  .Match(m => m
                    .Field(ff => ff
                      .AttachedCategoryIds.Contains(categoryId)
                    )
                  )
                )
                .Must(s => s
                  .QueryString(m => m
                    .Query(value)
                      .Fields(ff => ff
                        .Field(f => f.Name)
                        .Field(f => f.Variants.Select(input => input.LongDescription))
                      )
                      .Type(TextQueryType.CrossFields)
                    )
                  )
                )
              )
              .Size(take)
              .Sort(ss => ss.Descending(SortSpecialField.Score));

        var data = client.Search<Product>(query);

        var products = data.Documents;

        return products;
    }    

I expect to get only products from the current category back from elastic, but for some reason, it gives me products, that are not in a category/in a different category.


Solution

  • Your query is not correct. Assuming the following POCOs

    public class Product
    {
        public string Name { get; set; }
        public List<Variant> Variants { get; set; }
        public List<int> AttachedCategoryIds { get; set; }
    }
    
    public class Variant
    {
        public string LongDescription { get; set; }
    }
    

    The query would be something like

    var index = "index_name";
    var categoryId = 1;
    
    var value = "this is the query";
    var take = 20;
    
    var query = new SearchDescriptor<Product>()
        .Index(index)
        .Query(q => q
            .Bool(b => b
                .Must(s => s
                    .QueryString(m => m
                        .Query(value)
                        .Fields(ff => ff
                            .Field(f => f.Name)
                            .Field(f => f.Variants.First().LongDescription)
                        )
                        .Type(TextQueryType.CrossFields)
                    )
                )
                .Filter(f => f
                    .Term(ff => ff.AttachedCategoryIds, categoryId)
                )
            )
        )
        .Size(take)
        .Sort(ss => ss.Descending(SortSpecialField.Score));
    
    var searchResponse = client.Search<Product>(query);
    

    Some points

    1. .Field(f => f.Variants.First().LongDescription) is an expression that will resolve to a string that will be serialized in JSON to target a field in Elasticsearch. In this case, this will resolve to "variants.longDescription"

    2. A term query can be used to determine if a field in Elasticsearch contains a particular value. I've put the query into a bool query filter clause because I don't think you want to calculate a relevancy score for this part of the query i.e. a document either has the term in a field or it doesn't.

    This serializes to the following query

    POST http://localhost:9200/index_name/product/_search 
    {
      "query": {
        "bool": {
          "filter": [
            {
              "term": {
                "attachedCategoryIds": {
                  "value": 1
                }
              }
            }
          ],
          "must": [
            {
              "query_string": {
                "fields": [
                  "name",
                  "variants.longDescription"
                ],
                "query": "this is the query",
                "type": "cross_fields"
              }
            }
          ]
        }
      },
      "size": 20,
      "sort": [
        {
          "_score": {
            "order": "desc"
          }
        }
      ]
    }
    

    This query makes the assumption that Variants on Product is mapped as an object datatype. It can be more succinctly written as

    var query = new SearchDescriptor<Product>()
        .Index(index)
        .Query(q => q
            .QueryString(m => m
                .Query(value)
                .Fields(ff => ff
                    .Field(f => f.Name)
                    .Field(f => f.Variants.First().LongDescription)
                )
                .Type(TextQueryType.CrossFields)
            ) && +q
            .Term(ff => ff.AttachedCategoryIds, categoryId)
        )
        .Size(take)
        .Sort(ss => ss.Descending(SortSpecialField.Score));