Search code examples
emailparsinglogstashimapelastic-stack

How to configure Logstash to use one event for one file (for parsing mails)?


Target: analyze big set of emails stored in files. I have used offlineimap tool to download emails to local files.

I am familiar with ELK a bit however not sure how to configure Logstash properly to store one event per one file.

I have not tried multiline plugin yet because I do not have complete set of rules for starting/ending files. I just want to parse all files and store one event per one file regardless of how it starts/ends.

NOTE: could not use Logstash imap plugin because it fetches and stores only new emails, it does not load all mails from the server.

Similar question for different use case: Logstash Multiline filter unfortunately does not have any answer for more than couple years.


Solution

  • Solution was suggested in comments at Logstash Multiline filter and it worked. Basically I had to add some string to end of all files and then use multiline plugin.

    Created shell script to update all files with extra line:

    for file in **/**/*; do
      echo 'ENDOFMAILFILE' >> "$file"
    done
    

    after that I have used multiline plugin in logstash

    input {
      file {
        type => "logmail"
        path => [ "/var/log/mail/**/*" ]
        start_position => "beginning"
        codec => multiline {
          pattern => "^ENDOFMAILFILE$"
          negate => "true"
          what => "previous"
        }
      }
    }