nxlog and elasticsearch parsing issue

I'm running an ELK stack and passing all my windows logs to it from nxlog, and am having an issue specifically with IIS logs. In nxlog I'm running this in the nxlog.conf file

<Extension w3c> Module xm_csv Fields $date, $time, $s-ip, $cs-method, $cs-uri-stem, $cs-uri-query, $s-port, $cs-username, $c-ip, $csUser-Agent, $sc-status, $sc-substatus, $sc-win32-status, $time-taken FieldTypes string, string, string, string, string, string, string, string, string, string, string, string, string, string Delimiter ' ' UndefValue - </Extension>

I'm not running any parsing logstash, and when they show up in elasticsearch / kibana I get this giant message output,

{"message":"2015-10-19 22:17:26 10.10.10.10 GET javascriptScript.js - 443 - 10.10.10.10 Mozilla/4.0+(compatible;+MSIE+7.0;+Windows+NT+6.1;+WOW64;+Trident/7.0;+SLCC2;+.NET+CLR+2.0.50727;+.NET4.0C;+.NET4.0E;+.NET+CLR+3.5.30729;+.NET+CLR+3.0.30729) 200 0 0 31\r","@version":"1","@timestamp":"2015-10-19T22:19:08.061Z","host":"10.10.10.10","type":"WindowsEventLog","tags":["_jsonparsefailure"]}

I want to be able to parse this message, and get all the relevant data out. It seems like it should be possible to parse the iis log through nxlog and then pass the json information through to elasticsearch. But I'm not sure if this is something that I should be doing on the nxlog side, or logstash side. Everything I've looked at is using the same w3c extension, but there isn't a ton of data that I can look at using both nxlog and logstash to parse IIS logs.

Solution

You could add a grok filter to your logstash config. In the grok filter you can basically mirror your Fields definition from your nxlog config. You could basically use a pattern like this:

%{TIMESTAMP_ISO8601:ts}\s%{IP:s_ip}\s%{WORD:cs_method}\s%{DATA:cs_uri_stem}\s%{DATA:cs_uri_query}\s

This extracts the first part of fields from your message (up to cs_uri_query). If you want to extract the rest as well just extend the pattern. You can use the grok debugger (https://grokdebug.herokuapp.com/) to play around with the various patterns. A list of the predefined patterns is here: https://github.com/elastic/logstash/tree/v1.4.2/patterns

RESPONSE #2: @pcport I think I see where your problem is. You are using the DATA pattern, this is using a non-greedy regular expression - it is defined as this:

DATA .*?

You can either tell the reg-expparser that you are trying to match until end-of-line (just put a $ at the end of your pattern) or preferably you make your grok pattern more specific by using the NUMBER pattern instead of the DATA pattern. Try this for instance:

%{TIME:time_stamp}\s%{IP:source_ip}\s%{WORD:cs_method}\s%{DATA:cs_uri_stem}\s%{DATA:cs_uri_query}\s%{NUMBER:source_port}\s%{DATA:username}\s%{IP:client_ip}\s%{DATA:client_browser}\s%{NUMBER:request_status}\s%{NUMBER:request_substatus}\s%{NUMBER:win32_status‌}\s%{NUMBER:timeTaken}

One additional hint: Logstash by default stores everything as string in elasticsearch. If you want to do calculations in kibana (e.g. average time taken on all requests,...) you need to convert the field to a number type (currently supported are int and float, according to: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html). You do this by using the pattern like this:

%{NUMBER:timeTaken:int}