I have a very strange situation with Elasticsearch 5.4 and NEST 5.4.0. I have written a simple C# console app that queries Elastic once per minute and returns hits/documents and stores them in a Postgres database for further processing. It works very well for a few hours and then begins to return queries with valid .DebugInformation with zero documents yet I can copy and run the same query in Kibana Dev Tools and get results. When I stop the console app and restart it then it successfully queries and returns hits and all is well. Below are code samples and log entries. I am trying to figure out why it stops working after awhile. I am using .NET Core C# Console app with NEST.
I am not sure if the .DebugInformation is returning any info about the ES health at that moment to see if there are issues with the ES cluster at that moment like 429s. I have looked at the elasticsearch.log and that only shows inserts. I am not sure if there is a place to find query problems.
Has anyone had issues of NEST working fine and then stopping?
Here is a query log with two runs. The first runs fine and returns 9 rows (I removed all but one in the sample due to sensitive data) and then it runs again but returns zero hits. All queries after that have zero hits in them until I restart the C# code again. Same start and end date inputs and I get the real data in Elastic....
2017-09-12 16:41:59.799 -05:00 [Information] Dates: Start 9/12/2017 4:41:00 PM End 9/12/2017 4:42:00 PM
2017-09-12 16:41:59.800 -05:00 [Debug] AlertService._queryErrors: 9/12/2017 4:41:00 PM End 9/12/2017 4:42:00 PM
2017-09-12 16:41:59.811 -05:00 [Debug] AlertService._elasticQueryLogErrors: elasticQuery {
"bool": {
"filter":
[ {
"range":
{ "@timestamp": { "gte": "2017-09-12T21:41:00Z",
"lte": "2017-09-12T21:42:00Z" }
}
},
{
"exists" : { "field" : "error_data" }
}
]
} }
2017-09-12 16:41:59.811 -05:00 [Debug] AlertService._elasticQueryLogErrors: searchResponse 9 : Valid NEST response built from a successful low level call on POST: /filebeat-%2A/_search
# Audit trail of this API call:
- [1] HealthyResponse: Node: http://servername:9200/ Took: 00:00:00.0112120
# Request:
{"from":0,"query":{
"bool": {
"filter":
[ {
"range":
{ "@timestamp": { "gte": "2017-09-12T21:41:00Z",
"lte": "2017-09-12T21:42:00Z" }
}
},
{
"exists" : { "field" : "error_data" }
}
]
} }
# Response:
{"took":7,"timed_out":false,"_shards":{"total":215,"successful":215,"failed":0},"hits":{"total":9,"max_score":0.0,"hits":[{"_index":"filebeat-2017.09.12","_type":"log","_id":"AV54Cdl2yay890uCUru4","_score":0.0,"_source":{"offset":237474,"target_url":"...url...","input_type":"log","source":"....source....","type":"log","tags":["xxx-001","beats_input_codec_plain_applied","@timestamp":"2017-09-12T21:41:02.000Z","@version":"1","beat":{"hostname":"xxx-001","name":"xxx-001","version":"5.4.3"},"host":"xxx-001","timestamp":"09/12/2017 16:41:02","error_data":"EXCEPTION, see detail log"}]}
2017-09-12 16:41:59.811 -05:00 [Debug] AlertService._queryErrors: (result) System.Collections.Generic.List`1[XX.Alerts.Core.Models.FilebeatModel]
2017-09-12 16:41:59.811 -05:00 [Information] ErrorCount: 9
2017-09-12 16:42:00.222 -05:00 [Information] Dates: Start 9/12/2017 4:42:00 PM End 9/12/2017 4:43:00 PM
2017-09-12 16:42:00.222 -05:00 [Debug] AlertService._queryErrors: 9/12/2017 4:42:00 PM End 9/12/2017 4:43:00 PM
2017-09-12 16:42:00.229 -05:00 [Debug] AlertService._elasticQueryLogErrors: elasticQuery {
"bool": {
"filter":
[ {
"range":
{ "@timestamp": { "gte": "2017-09-12T21:42:00Z",
"lte": "2017-09-12T21:43:00Z" }
}
},
{
"exists" : { "field" : "error_data" }
}
]
} }
2017-09-12 16:42:00.229 -05:00 [Debug] AlertService._elasticQueryLogErrors: searchResponse 0 : Valid NEST response built from a successful low level call on POST: /filebeat-%2A/_search
# Audit trail of this API call:
- [1] HealthyResponse: Node: http://servername:9200/ Took: 00:00:00.0066742
# Request:
{"from":0,"query":{
"bool": {
"filter":
[ {
"range":
{ "@timestamp": { "gte": "2017-09-12T21:42:00Z",
"lte": "2017-09-12T21:43:00Z" }
}
},
{
"exists" : { "field" : "error_data" }
}
]
} }
# Response:
{"took":4,"timed_out":false,"_shards":{"total":215,"successful":215,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
2017-09-12 16:42:00.229 -05:00 [Debug] AlertService._queryErrors: (result) System.Collections.Generic.List`1[Q2.Alerts.Core.Models.FilebeatModel]
2017-09-12 16:42:00.229 -05:00 [Information] ErrorCount: 0
Here is my NEST query
public IEnumerable<FilebeatModel> _elasticQueryLogErrors(DateTime startDate, DateTime endDate)
{
//var startDateString = startDate.Kind;
//var endDateString = endDate.Kind;
var searchQuery = @"{
""bool"": {
""filter"":
[ {
""range"":
{ ""@timestamp"": { ""gte"": """ + string.Format("{0:yyyy-MM-ddTHH:mm:ssZ}", startDate.ToUniversalTime()) +
@""",
""lte"": """ + string.Format("{0:yyyy-MM-ddTHH:mm:ssZ}", endDate.ToUniversalTime()) + @""" }
}
},
{
""exists"" : { ""field"" : ""error_data"" }
}
]
} }";
var searchResponse = _es.Search<FilebeatModel>(s => s
.AllTypes()
.From(0)
.Query(query => query.Raw(searchQuery)));
_logger.LogDebug("AlertService._elasticQueryLogErrors: elasticQuery " + searchQuery);
_logger.LogDebug("AlertService._elasticQueryLogErrors: searchResponse " + searchResponse.Hits.Count + " : " + searchResponse.DebugInformation);
foreach (var searchResponseHit in searchResponse.Hits)
{
searchResponseHit.Source.Id = searchResponseHit.Id;
}
return searchResponse.Documents.ToList();
}
Here is my constructor of the class that is running the above code in a loop. The loops may run for hours or days. This may be the area of my issue is how the connection is constructed for a long period of time. When I close and reopen the app running the queries over the period missed they run just fine.
public AlertService(IOptions<ElasticConfig> elasticConfig, AlertsDbContext context, ILogger<AlertService> logger)
{
_logger = logger;
_logger.LogDebug(" *** Entering AlertService");
string elasticConnectionString = elasticConfig.Value.ConnectionString;
string defaultIndex = elasticConfig.Value.IndexName;
var settings = new ConnectionSettings(
new Uri(elasticConnectionString))
.ConnectionLimit(-1)
.DisableDirectStreaming()
.DefaultIndex(defaultIndex);
_es = new ElasticClient(settings);
_context = context;
}
I have confirmed that this is a race condition created by myself as the internal timer creeps up on the call to Elastic as Val pointed out in the comments. It is not a bug in NEST but simply my code and its timing. I have aligned the call using System.Threading.Timer into a single callback per elapse and it works properly. Thanks to Val for his assistance