I'm using Elasticsearch to retrieve a few logs:
http:/localhost:9200/collection/_search?q=type:"log"
It brings me a few hits like this:
{
"_index": "collection",
"_type": "doc",
"_id": "UL878GMBYKUUOvfyQJWl",
"_score": 6.487114,
"_source": {
"@version": "1",
"type": "log",
"message": "64.242.88.10;[07/Mar/2004:16:11:58 -0800];"GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1\"; 200 7352\r",
"@timestamp": "2018-06-11T19:03:23.163Z",
"host": "logstash",
"path": "/opt/access_log.log"
}
}
Each hit has a "message", that is like a line from a CSV "access_log.log".
But every info useful is inside "message" as only one big String. So I need to extract somehow to identify the server IP (64.242.88.10) for example.
How can I split this "message" string using ";" as regex so that I can get only the data I need?
You can use grok filter plugin for that.
Grok is a great way to parse unstructured log data into something structured and queryable.
This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format that is generally written for humans and not computer consumption.
Logstash ships with about 120 patterns by default. You can find them here: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns. You can add your own trivially. (See the patterns_dir setting)
If you need help building patterns to match your logs, you will find the http://grokdebug.herokuapp.com and http://grokconstructor.appspot.com/ applications quite useful!