I have an ELK deployment to collect logs. And now I have the requirement to pull out all logs include one specific string. But I got an interesting issue that I got different output in dev tool of Kibana and from the elasticsearch python client.
Here is the query in Kibana:
GET app_web_log-20180827/_search
{
"query": {
"bool": {
"must": [
{ "match_phrase": { "message": "Failed to call Billing API Server" }}
],
"filter": [
{ "term": { "deployment": "app_instance1" }},
{ "term": { "module": "test_module" }},
{ "range": { "@timestamp": { "gte": 1535266800000, "lt": 1535353200000 }}}
]
}
},
"size": 5
}
Here is the output of the Dev tool:
{
"took": 556,
"timed_out": false,
"_shards": {
"total": 175,
"successful": 175,
"skipped": 165,
"failed": 0
},
"hits": {
"total": 400,
"max_score": 34.769733,
"hits": [
{
"_index": "app_web_log-20180827",
"_type": "doc",
"_id": "FMkHeWUB_hBu7Tio4Llg",
"_score": 34.769733,
"_source": {
"beat": {
"version": "6.2.4",
"name": "app-web001",
"hostname": "app-web001"
},
"offset": 349461,
"@timestamp": "2018-08-27T01:38:03.049Z",
"source": "/apphome/app_instance1/logs/test_module.log",
"message": "2018-08-27 01:37:59,661 [http-bio-8168-exec-8] ERROR [Billing APIClientImpl] Failed to call Billing API Server. Billing API Billing server response error, tranId:c95cede3a011d97fd9f3d661eb961cb8",
"module": "test_module",
"@version": "1",
"deployment": "app_instance1"
}
},
....
But when I query use elasticsearch python client. It gave me nothing:
from elasticsearch import Elasticsearch
es = Elasticsearch([{'host': 'esserver', 'port': 9200, 'username': 'appuser', 'password': 'elastic'}])
body = {
"query": {
"bool": {
"must": [
{ "match_phrase": { "message": "Failed to call Billing API Server" }}
],
"filter": [
{ "term": { "deployment": "app_instance1" }},
{ "term": { "module": "test_module" }},
{ "range": { "@timestamp": { "gte": 1535266800000, "lt": 1535353200000 }}}
]
}
}
}
print body
page = es.search(index='app_web_log-20180827', doc_type='doc', body=body,
scroll='2m', size=100)
sid = page['_scroll_id']
scroll_size = page['hits']['total']
while (scroll_size > 0):
print "Scrolling..."
page = es.scroll(scroll_id = sid, scroll = '2m')
# Update the scroll ID
sid = page['_scroll_id']
# Get the number of results that we returned in the last scroll
scroll_size = len(page['hits']['hits'])
for m in page['hits']['hits']:
msg = m['_source']['message']
print msg
I got nothing:
{'query': {'bool': {'filter': [{'term': {'deployment': 'app_instance1'}}, {'term': {'module': 'test_module'}}, {'range': {'@timestamp': {'lt': 1535353200000, 'gte': 1535266800000}}}], 'must': [{'match_phrase': {'message': 'Failed to call Billing API Server'}}]}}}
Scrolling...
I'm wondering if there is anything wrong in the code? Please kindly help. Thanks
I would recommend you have a look at the scan
helper ([0]) that does the logic for you.
I assume that since you are only iterating over the page after you call scroll
and not before, you are not processing the hits returned by your search
API call. You also have size
set to 100
so it is quite possible all the hits are in the first value of the page
variable that you are ignoring.
0 - https://elasticsearch-py.readthedocs.io/en/master/helpers.html#scan