Search code examples
rubyloggingamazon-s3logstashlogstash-grok

Creating a combined S3 logfile that can be parsed by Logstash


I've written a script to continuously pull all my S3 bucket logfiles down to my Logstash server, so it can be parsed using the patterns in this pull request. Alas, given the script recreates the logfile from scratch instead of just appending to it, Logstash's file input isn't seeing any new changes. Any ideas?

Script below:

#!/usr/bin/ruby

require 'rubygems'
require 'aws/s3'

# for non-us buckets, we need to change the endpoint
AWS.config(:s3_endpoint => "s3-eu-west-1.amazonaws.com")

# connect to S3
s3 = AWS::S3.new(:access_key_id => S3_ACCESS_KEY, :secret_access_key => S3_SECRET_KEY)

# grab the bucket where the logs are stored
bucket = s3.buckets[BUCKET_NAME]

File.open("/var/log/s3_bucket.log", 'w') do |file|

  # grab all the objects in the bucket, can also use a prefix here and limit what S3 returns
  bucket.objects.with_prefix('staticassets-logs/').each do |log|
    log.read do |line|
      file.write(line)
    end
  end
end

Any help? Thanks!


Solution

  • I ended up changing my script to the following:

    #!/bin/bash 
    export PATH=$PATH:/bin:/usr/bin
    cd /var/log/s3/$S3_BUCKET/
    export s3url=s3://$S3_BUCKET/$S3_PREFIX
    s3cmd -c /home/logstash/.s3cfg sync --skip-existing $s3url .
    

    ...And changing it from evaluating a single logfile to globbing the entire /var/log/s3/my_bucket directory:

    input {
      file {
        type => "s3-access-log"
        path => "/var/log/s3/$S3_BUCKET/$S3_BUCKET/*"
        sincedb_path => "/dev/null"
        start_position => "beginning"
      }
    }
    filter {
        if [type] == "s3-access-log" {
            grok {
                patterns_dir => ["/etc/logstash/conf.d/patterns"]
                match => { "message" => "%{S3_ACCESS_LOG}" }
                remove_field => ["message"]
            }
            date {
                match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
                remove_field => ["timestamp"]
            }
        }
    }
    output {
      elasticsearch { host => localhost }
      stdout { codec => rubydebug }
    }
    

    Works brilliantly now.