Search code examples
rubyparsingemailmbox

How to parse mailbox file in Ruby?


The Ruby gem rmail has methods to parse a mailbox file on local disk. Unfortunately this gem has broken (in Ruby 2.0.0). It might not get fixed, because folks are migrating to the gem mail.

Gem mail has method Mail.read('filename.txt'), but that parses only the first message in a mailbox.

That gem, and builtin Net::IMAP, have flooded the net with tutorials on accessing mailboxes through imap.

So, is there still a way to parse a plain old file, without imap? As the lone rubyist in my group I'd rather not embarrass myself by resorting to http://docs.python.org/2/library/mailbox.html.

Or, worse yet, PHP's imap_open('/var/mail/www-data', ...) -- if only Net::IMAP.new accepted filenames like that.


Solution

  • The good news is the Mbox format is really dead simple, though it's simplicity is why it was eventually replaced. Parsing a large mailbox file to extract a single message is not specially efficient.

    If you can split apart the mailbox file into separate strings, you can pass these strings to the Mail library for parsing.

    An example starting point:

    def parse_message(message)
      Mail.new(message)
    
      do_other_stuff!
    end
    
    message = nil
    
    while (line = STDIN.gets)
      if (line.match(/\AFrom /))
        parse_message(message) if (message)
        message = ''
      else
        message << line.sub(/^\>From/, 'From')
      end
    end
    

    The key is that each message starts with "From " where the space after it is key. Headers will be defined as From: and any line that starts with ">From" is to be treated as actually being "From". It's things like this that make this encoding method really inadequate, but if Maildir isn't an option, this is what you've got to do.