Search code examples
elasticsearchnest

scroll_id returns 0 hits when used the second time


I have this code to get the scroll_id after doing the first search:

var initSearch = client.LowLevel.Search<dynamic>(INDEX, TYPE, QUERY, x => x.AddQueryString("scroll", "1m").AddQueryString("size", "2"));

string scrollId = initSearch.Body["_scroll_id"].ToString();

then I used the scrollid during the 2nd search but it didn't return any hits

var scrollSearch = client.LowLevel.ScrollGet<dynamic>(x => 
x.AddQueryString("scroll", "1m").AddQueryString("scroll_id", scrollId));
scrollId = scrollSearch.Body["_scroll_id"].ToString();

var searchHits = int.Parse(scrollSearch.Body["hits"]["total"].ToString());

searchHits.Count is zero. What may be the cause of this? Also, when I loop into the scrollSearch again, I am expecting that the scrollid would change but it is not changing values.


Solution

  • A size of 2 will return 2 documents in each response, including the first response. So, if the total documents for a given query were less than or equal to 2, all documents would be returned within the first response. Take the following for example

    void Main()
    {
        var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));
        var defaultIndex = "messages";
        var connectionSettings = new ConnectionSettings(pool)
            .DefaultIndex(defaultIndex)
            .PrettyJson()
            .DisableDirectStreaming()
            .OnRequestCompleted(response =>
                {
                    if (response.RequestBodyInBytes != null)
                    {
                        Console.WriteLine(
                            $"{response.HttpMethod} {response.Uri} \n" +
                            $"{Encoding.UTF8.GetString(response.RequestBodyInBytes)}");
                    }
                    else
                    {
                        Console.WriteLine($"{response.HttpMethod} {response.Uri}");
                    }
    
                    Console.WriteLine();
    
                    if (response.ResponseBodyInBytes != null)
                    {
                        Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                                 $"{Encoding.UTF8.GetString(response.ResponseBodyInBytes)}\n" +
                                 $"{new string('-', 30)}\n");
                    }
                    else
                    {
                        Console.WriteLine($"Status: {response.HttpStatusCode}\n" +
                                 $"{new string('-', 30)}\n");
                    }
                });
    
        var client = new ElasticClient(connectionSettings);
    
        if (client.IndexExists(defaultIndex).Exists)
        {
            client.DeleteIndex(defaultIndex);
        }
    
        client.IndexMany(new[]
        {
            new Message { Content = "message 1" },
            new Message { Content = "message 2" },
            new Message { Content = "message 3" },
            new Message { Content = "message 4" },
            new Message { Content = "message 5" },
            new Message { Content = "message 6" },
        });
    
        client.Refresh(defaultIndex);
    
        var searchResponse = client.Search<Message>(s => s
            .Scroll("1m")
            .Size(2)
            .Query(q => q
                .Terms(t => t
                    .Field(f => f.Content.Suffix("keyword"))
                    .Terms("message 1", "message 2")
                )
            )
        );
    
        searchResponse = client.Scroll<Message>("1m", searchResponse.ScrollId);
    }
    
    public class Message
    {
        public string Content { get; set; }
    }
    

    The search and scroll responses return

    ------------------------------
    
    POST http://localhost:9200/messages/message/_search?pretty=true&scroll=1m 
    {
      "size": 2,
      "query": {
        "terms": {
          "content.keyword": [
            "message 1",
            "message 2"
          ]
        }
      }
    }
    
    Status: 200
    {
      "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAADGFnM1SnhtUVdIUmgtM1YyZ2NQei1hZEEAAAAAAAAAxxZzNUp4bVFXSFJoLTNWMmdjUHotYWRBAAAAAAAAAMgWczVKeG1RV0hSaC0zVjJnY1B6LWFkQQAAAAAAAADJFnM1SnhtUVdIUmgtM1YyZ2NQei1hZEEAAAAAAAAAyhZzNUp4bVFXSFJoLTNWMmdjUHotYWRB",
      "took" : 4,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 0.6931472,
        "hits" : [
          {
            "_index" : "messages",
            "_type" : "message",
            "_id" : "AV8IkTSbM7nzQBTCbQok",
            "_score" : 0.6931472,
            "_source" : {
              "content" : "message 1"
            }
          },
          {
            "_index" : "messages",
            "_type" : "message",
            "_id" : "AV8IkTSbM7nzQBTCbQol",
            "_score" : 0.6931472,
            "_source" : {
              "content" : "message 2"
            }
          }
        ]
      }
    }
    
    ------------------------------
    
    POST http://localhost:9200/_search/scroll?pretty=true 
    {
      "scroll": "1m",
      "scroll_id": "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAADGFnM1SnhtUVdIUmgtM1YyZ2NQei1hZEEAAAAAAAAAxxZzNUp4bVFXSFJoLTNWMmdjUHotYWRBAAAAAAAAAMgWczVKeG1RV0hSaC0zVjJnY1B6LWFkQQAAAAAAAADJFnM1SnhtUVdIUmgtM1YyZ2NQei1hZEEAAAAAAAAAyhZzNUp4bVFXSFJoLTNWMmdjUHotYWRB"
    }
    
    Status: 200
    {
      "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAADGFnM1SnhtUVdIUmgtM1YyZ2NQei1hZEEAAAAAAAAAxxZzNUp4bVFXSFJoLTNWMmdjUHotYWRBAAAAAAAAAMgWczVKeG1RV0hSaC0zVjJnY1B6LWFkQQAAAAAAAADJFnM1SnhtUVdIUmgtM1YyZ2NQei1hZEEAAAAAAAAAyhZzNUp4bVFXSFJoLTNWMmdjUHotYWRB",
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : 2,
        "max_score" : 0.6931472,
        "hits" : [ ]
      }
    }
    
    ------------------------------
    

    Since there are only 2 matching documents for the given query and size was set to 2, both documents are returned in the first response and the following scroll response does not contain any hits.

    You can use the total from the initial search response to determine whether you need to call the scroll API for more documents.

    The actual _scroll_id value is an implementation detail which may or may not change values on subsequent calls. I would not recommend basing any logic on its value, but only use the _scroll_id value returned from the last scroll response in the subsequent scroll request.