elasticsearch amazon-elastic-beanstalk logstash logstash-grok

How to extract variables from log file path, test log file name for pattern in Logstash?

I have AWS ElasticBeanstalk instance logs on S3 bucket.

Path to Logs is:

resources/environments/logs/publish/e-3ykfgdfgmp8/i-cf216955/_var_log_nginx_rotated_access.log1417633261.gz

which translates to :

resources/environments/logs/publish/e-[random environment id]/i-[random instance id]/

The path contains multiple logs:

_var_log_eb-docker_containers_eb-current-app_rotated_application.log1417586461.gz
_var_log_eb-docker_containers_eb-current-app_rotated_application.log1417597261.gz
_var_log_rotated_docker1417579261.gz
_var_log_rotated_docker1417582862.gz
_var_log_rotated_docker-events.log1417579261.gz
_var_log_nginx_rotated_access.log1417633261.gz

Notice that there's some random number (timestamp?) inserted by AWS in filename before ".gz"

Problem is that I need to set variables depending on log file name.

Here's my configuration:

input {
        s3 {
                debug => "true"
                bucket => "elasticbeanstalk-us-east-1-something"
                region => "us-east-1"
                region_endpoint => "us-east-1"
                credentials => ["..."]
                prefix => "resources/environments/logs/publish/"
                sincedb_path => "/tmp/s3.sincedb"
                backup_to_dir => "/tmp/logstashed/"
                tags => ["s3","elastic_beanstalk"]
                type => "elastic_beanstalk"
        }
}

filter {
 if [type] == "elastic_beanstalk" {
  grok {
    match => [ "@source_path", "resources/environments/logs/publish/%{environment}/%{instance}/%{file}<unnecessary_number>.gz" ]
  }
 }
}

In this case I want to extract environment , instance and file name from path. In file name I need to ignore that random number. Am I doing this the right way? What will be full, correct solution for this?

Another question is how can I specify fields for custom log format for particular log file from above?

This could be something like: (meta-code)

filter {
     if [type] == "elastic_beanstalk" {
       if [file_name] BEGINS WITH "application_custom_log" {
         grok {

            match => [ "message", "%{IP:client} %{WORD:method} %{URIPATHPARAM:request} %{NUMBER:bytes} %{NUMBER:duration}" ]

         }
       }

       if [file_name] BEGINS WITH "some_other_custom_log" {
        ....
       }
     }
    }

How do I test for file name pattern?

Solution

For your first question, and assuming that @source_path contains the full path, try:

match => [ "@source_path", "logs/publish/%{NOTSPACE:env}/%{NOTSPACE:instance}/%{NOTSPACE:file}%{NUMBER}%{NOTSPACE:suffix}" ]

This will create 4 logstash field for you:

env
instance
file
suffix

More information is available on the grok man page and you should test with the grok debugger.

To test fields in logstash, you use conditionals, e.g.

if [field] == "value"
if [field] =~ /regexp/

etc.

Note that it's not always necessary to do this with grok. You can have multiple 'match' arguments, and it will (by default) stop after hitting the first one that matches. If your patterns are exclusive, this should work for you.