I need to migrate millions of records from a SQL database to ES. Currently we insert records in ES via GELF HTTP, but only doing that one record at a time just isn't feasible.
I've been working on this a couple days and am new to both GrayLog and ElasticSearch. I'm trying to find a way to Bulk insert messages into ES and then have them display in GrayLog. I've been using Cerebro to monitor the indexes and the number of messages in each of them. When I do the Bulk insert, the message count does increase in the correct Index, however I can not see them in GrayLog.
Here is what I have:
var _elasticsearchContext = new ElasticsearchContext(ConnectionString, new ElasticsearchMappingResolver());
var connectionSettings = new ConnectionSettings(new Uri(ConnectionString))
.MapDefaultTypeIndices(m => m.Add(typeof(Auditing_Dev), "auditing-dev_0"));
var elasticClient = new ElasticClient(connectionSettings);
var items = new List<Auditing_Dev>();
//I loop through a DataReader creating new Auditing_Dev objects
//and add them to the items collection
var bulkResponse = elasticClient.Bulk(b => b.IndexMany(items, (d, doc) => d.Document(doc).Index("auditing-dev_0").Type("message")));
I get back a valid response and I see the document count increase in Cerebro in the auditing-dev_0 index. When I compare a message that I insert via Bulk to one that is inserted via HTTP request, the indexes and types are the same.
Message I insert:
{
"_index" : "auditing-dev_0",
"_type" : "message",
"_id" : "AVsWWn-jNp2NX1vOria1",
"_version" : 1,
"found" : true,
"_source" : {
"level" : 5,
"origin" : "10.80.3.2",
"success" : true,
"type" : "Company.Enterprise",
"user" : "stupid@dropdown.test",
"gl2_source_input" : "57193c1d0cf25a44afc31c15",
"gl2_source_node" : "5866cc80-382e-4287-ae5b-8a0a68a9a1f1",
"gl2_remote_ip" : "10.100.20.164",
"gl2_remote_port" : 52273,
"streams" : [ "578fbabe738a897c6d91336b" ]
}
}
Compared to one inserted via HTTP:
{
"_index" : "auditing-dev_0",
"_type" : "message",
"_id" : "e3d34d50-0a8a-11e7-84bb-00155d007a32",
"_version" : 1,
"found" : true,
"_source" : {
"level" : 5,
"gl2_remote_ip" : "192.168.211.114",
"origin" : "192.168.211.35",
"gl2_remote_port" : 2960,
"streams" : [ "578fbabe738a897c6d91336b" ],
"gl2_source_input" : "57193c1d0cf25a44afc31c15",
"success" : "True",
"gl2_source_node" : "5866cc80-382e-4287-ae5b-8a0a68a9a1f1",
"user" : "admin@purple-pink.test",
"timestamp" : "2017-03-16 22:43:44.000"
}
}
I see the _id is a different format, but does that matter?
In GrayLog there is only one Input and that is the one for GELF HTTP. Do I need to add a new Input?
Turned out to be the Timestamp field not being present. Who knew?