Search code examples
elasticsearchlogstashlogstash-grok

Store Logstash Grok decomposition in a field


I am trying to parse logs in which I have 2 URI that I would like to parse using logstash.

With this input (2 URI separated by a space):

https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html?toto https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns

I would like to get the following document :

{
  "source": {
    "URI" : "https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html?toto",
    "URIPROTO" : "https",
    "URIHOST": "www.elastic.co",
    "URIPATHPARAM": "/guide/en/logstash/current/plugins-filters-grok.html?toto",

    ...
  },
  "destination" : {
    "URI" : "https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns",
    "URIPROTO" : "https",
    "URIHOST": "github.com",
    "URIPATHPARAM": "/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns",

    ...
  }
}

I have been trying to use this grok filter :

%{URI:source} %{URI:destination}

But I get the following result where source and destination information(URIPROTO, URIHOST...) are merged in arrays in the root node of my document :

{
  "source": [
    "https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html?toto"
    ],
  "URIPROTO": [
    "https",
    "https"
  ],

  ...

  "URIHOST": [
    "www.elastic.co",
    "github.com"
  ],
  "IPORHOST": [
    "www.elastic.co",
    "github.com"
  ],
  "HOSTNAME": [
    "www.elastic.co",
    "github.com"
  ],
  "IP": [
    null,
    null
  ],

  ...

  "destination": [
    "https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns"
  ]
}

Has anyone encoutered such situation and found a solution ? Thanks in advance for your help !


Solution

  • The %{URI} pattern doesn't create any fields from the components of the URI. If you want them split out (for one URI or two), you'll need to make your own pattern. Copying the definition of URI and adding field assignments gives you this:

    %{URIPROTO:[foo][proto]}://(?:%{USER:[foo][user]}(?::[^@]*)?@)?(?:%{URIHOST:[foo][host]})?(?:%{URIPATHPARAM:[foo][pathparam]})? %{URIPROTO:[bar][proto]}://(?:%{USER:[bar][user]}(?::[^@]*)?@)?(?:%{URIHOST:[bar][host]})?(?:%{URIPATHPARAM:[bar][pathparam]})?
    

    Note that URIHOST will create a 'port' field, which would not be unique with two URIs. To solve this, replace each URIHOST in my pattern with:

     %{IPORHOST}(?::%{POSINT:[myField]})?
    

    Using the desired destination for each 'myField'.