Search code examples
elasticsearchkibanaelasticsearch-py

Kibana dev tool and elasticsearch-py client get different output


I have an ELK deployment to collect logs. And now I have the requirement to pull out all logs include one specific string. But I got an interesting issue that I got different output in dev tool of Kibana and from the elasticsearch python client.

Here is the query in Kibana:

GET app_web_log-20180827/_search
{
  "query": {
    "bool": {
      "must": [
        { "match_phrase": { "message":   "Failed to call Billing API Server" }}
      ],
      "filter": [
        { "term":  { "deployment": "app_instance1" }},
        { "term":  { "module": "test_module" }}, 
        { "range": { "@timestamp": { "gte": 1535266800000, "lt": 1535353200000 }}} 
      ]
    }
  },
  "size": 5
}

Here is the output of the Dev tool:

{
  "took": 556,
  "timed_out": false,
  "_shards": {
    "total": 175,
    "successful": 175,
    "skipped": 165,
    "failed": 0
  },
  "hits": {
    "total": 400,
    "max_score": 34.769733,
    "hits": [
      {
        "_index": "app_web_log-20180827",
        "_type": "doc",
        "_id": "FMkHeWUB_hBu7Tio4Llg",
        "_score": 34.769733,
        "_source": {
          "beat": {
            "version": "6.2.4",
            "name": "app-web001",
            "hostname": "app-web001"
          },
          "offset": 349461,
          "@timestamp": "2018-08-27T01:38:03.049Z",
          "source": "/apphome/app_instance1/logs/test_module.log",
          "message": "2018-08-27 01:37:59,661 [http-bio-8168-exec-8] ERROR [Billing APIClientImpl] Failed to call Billing API Server. Billing API Billing server response error, tranId:c95cede3a011d97fd9f3d661eb961cb8",
          "module": "test_module",
          "@version": "1",
          "deployment": "app_instance1"
        }
      },
....

But when I query use elasticsearch python client. It gave me nothing:

from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'esserver', 'port': 9200, 'username': 'appuser', 'password': 'elastic'}])

body = {
  "query": { 
    "bool": { 
      "must": [
        { "match_phrase": { "message":   "Failed to call Billing API Server" }}
      ],
      "filter": [ 
        { "term":  { "deployment": "app_instance1" }},
        { "term":  { "module": "test_module" }}, 
        { "range": { "@timestamp": { "gte": 1535266800000, "lt": 1535353200000 }}} 
      ]
    }
  }
}
print body

page = es.search(index='app_web_log-20180827', doc_type='doc', body=body,
         scroll='2m', size=100)
sid = page['_scroll_id']
scroll_size = page['hits']['total']
while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    for m in page['hits']['hits']:
        msg = m['_source']['message']
        print msg

I got nothing:

{'query': {'bool': {'filter': [{'term': {'deployment': 'app_instance1'}}, {'term': {'module': 'test_module'}}, {'range': {'@timestamp': {'lt': 1535353200000, 'gte': 1535266800000}}}], 'must': [{'match_phrase': {'message': 'Failed to call Billing API Server'}}]}}}
Scrolling...

I'm wondering if there is anything wrong in the code? Please kindly help. Thanks


Solution

  • I would recommend you have a look at the scan helper ([0]) that does the logic for you.

    I assume that since you are only iterating over the page after you call scroll and not before, you are not processing the hits returned by your search API call. You also have size set to 100 so it is quite possible all the hits are in the first value of the page variable that you are ignoring.

    0 - https://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan