Search code examples
ruby-on-railsrubycharacter-encodingimap

Incompatible Character Encoding in rails - how to just fail/skip sensibly?


I'm having an issue when importing Email subjects via IMAP. I'm getting a problem, I think related to the £ sign in email subjects. Having spent a couple of hours touring around various answers I can't seem to find anything that works... If I try the following...

Using ruby 2.1.2 views/emails/index

=email.subject
incompatible character encodings: ASCII-8BIT and UTF-8

=email.subject.scrub
incompatible character encodings: ASCII-8BIT and UTF-8

= email.subject.encode!('UTF-8', 'UTF-8', :invalid => :replace)
invalid byte sequence in UTF-8

= email.subject.force_encoding('UTF-8')
invalid byte sequence in UTF-8

= email.subject.encode("UTF-8", invalid: :replace)
"\xA3" from ASCII-8BIT to UTF-8

/xA3 is the '£' sign which shouldn't be that unusual.

I'm currently working with the following...

-if email.subject.force_encoding('UTF-8').valid_encoding?
      =email.subject
    -else
      "Can't display"

What I would ideally do is just have something which checked if the encoding was working, and then did something like #scrub is supposed to do... I'd even take it with '/xA3' perfectly happily so long as it wasn't throwing an error and I could basically see the text.

Any ideas on either how to do it properly or a fudge to solve the issue?


Solution

  • After much pain this is how I solved it.

    You need to add default encoding to your environment.rb file, like so:

    # Load the rails application
    require File.expand_path('../application', __FILE__)
    Encoding.default_external = Encoding::UTF_8
    Encoding.default_internal = Encoding::UTF_8
    # Initialize the rails application
    Stma::Application.initialize!
    

    Apparently this is something to do with Ruby's roots in japan. When dealing with Japanese (or russian) characters this wouldn't be helpful so this sort of thing isn't there as standard.

    I've then done the following:

    mail_object = Mail.new(mail[0].attr["RFC822"])
    subject = mail_object.subject.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') if mail_object.subject
    body_part = (mail_object.text_part || mail_object.html_part || mail_object).body.decoded
    body = body_part.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') if body_part
    
    from = mail_object.from.join(",") if mail_object.from #deals with multiple addresses
    to = mail_object.to.join(",") if mail_object.to #deals with multiple addresses
    

    That should get all the main pieces into strings / text you can easily work with that won't fail nastily if somethings missing/unusual...etc. Hope that helps somebody...