I've written a script to continuously pull all my S3 bucket logfiles down to my Logstash server, so it can be parsed using the patterns in this pull request. Alas, given the script recreates the logfile from scratch instead of just appending to it, Logstash's file
input isn't seeing any new changes. Any ideas?
Script below:
#!/usr/bin/ruby
require 'rubygems'
require 'aws/s3'
# for non-us buckets, we need to change the endpoint
AWS.config(:s3_endpoint => "s3-eu-west-1.amazonaws.com")
# connect to S3
s3 = AWS::S3.new(:access_key_id => S3_ACCESS_KEY, :secret_access_key => S3_SECRET_KEY)
# grab the bucket where the logs are stored
bucket = s3.buckets[BUCKET_NAME]
File.open("/var/log/s3_bucket.log", 'w') do |file|
# grab all the objects in the bucket, can also use a prefix here and limit what S3 returns
bucket.objects.with_prefix('staticassets-logs/').each do |log|
log.read do |line|
file.write(line)
end
end
end
Any help? Thanks!
I ended up changing my script to the following:
#!/bin/bash
export PATH=$PATH:/bin:/usr/bin
cd /var/log/s3/$S3_BUCKET/
export s3url=s3://$S3_BUCKET/$S3_PREFIX
s3cmd -c /home/logstash/.s3cfg sync --skip-existing $s3url .
...And changing it from evaluating a single logfile to globbing the entire /var/log/s3/my_bucket
directory:
input {
file {
type => "s3-access-log"
path => "/var/log/s3/$S3_BUCKET/$S3_BUCKET/*"
sincedb_path => "/dev/null"
start_position => "beginning"
}
}
filter {
if [type] == "s3-access-log" {
grok {
patterns_dir => ["/etc/logstash/conf.d/patterns"]
match => { "message" => "%{S3_ACCESS_LOG}" }
remove_field => ["message"]
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
remove_field => ["timestamp"]
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
Works brilliantly now.