Search code examples
ruby-on-railsruby-on-rails-3utf-8character-encodingdelayed-job

Method fails due to Encoding::UndefinedConversionError: U+03B1 from UTF-8 to ISO-8859-1


I am running DelayedJob and Mechanize on my Rails 3.2.13 app - although those two things might not be part of this problem. My app was working fine until a week ago. Now, when I run my Rails 3.2.13 app in a production environment with my delayed method scrape, it reports back:

Jul 03 13:57:19 myapp app/worker

.1:     (30.7ms)  SELECT COUNT(*) AS count_all, priority AS priority FROM "delayed_jobs" WHERE (run_at < '2013-07-03 20:57:19.344207' and failed_at is NULL) GROUP BY priority` 

`Jul 03 13:58:01 myapp app/worker.1:  [Worker(host:1d2342a-b234f-4342-bcd7-0afsji3e60dab pid:2)] Person#scrape failed with Encoding::UndefinedConversionError: U+03B1 from UTF-8 to ISO-8859-1 - 0 failed attempts` 

`Jul 03 13:58:01 myapp app/worker.1:  2013-07-03T20:58:01+0000: [Worker(host:1d30481a-cf4f-4344-bad7-0e5e0ae60cab pid:2)] Person#scrape failed with Encoding::UndefinedConversionError: U+03B1 from UTF-8 to ISO-8859-1 - 0 failed attempts` 

`Jul 03 13:58:01 myapp app/worker.1:     (3.1ms)  BEGIN 
Jul 03 13:58:01 myapp app/worker.1:     (18.3ms)  UPDATE "delayed_jobs" SET "last_error" = 'U+03B1 from UTF-8 to ISO-8859-1 
Jul 03 13:58:01 myapp app/worker.1:  /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/util.rb:57:in `encode'' 
Jul 03 13:58:01 myapp app/worker.1:  /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/util.rb:57:in `encode_to'' 
Jul 03 13:58:01 myapp app/worker.1:  /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/util.rb:43:in `from_native_charset'' 
Jul 03 13:58:01 myapp app/worker.1:  /app/vendor/bundle/ruby/1.9.1/gems/mechanize-2.7.1/lib/mechanize/form.rb:243:in `from_native_charset'' 

That code of U+03B1 corresponds to the letter alpha (α). However, I never would have written an alpha into my code, as I have no use for it. I thought it may have had to do with my Twitter Bootstrap installation, but I just uninstalled it, and the problem hasn't gone away.

Here is the object it's handling. I noted that in all of the entries I'ved tried, there is always an exclamation point before the mylist attribute. I'm not sure why.

19:51:06 web.1 | SQL (0.8ms) INSERT INTO "delayed_jobs" ("attempts", "created_at", "failed_at", "handler", "last_error", "locked_at", "locked_by", "priority", "queue", "run_at", "updated_at") VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [["attempts", 0], ["created_at", Wed, 03 Jul 2013 23:51:06 UTC +00:00], ["failed_at", nil], ["handler", "--- !ruby/object:Delayed::PerformableMethod\nobject: !ruby/ActiveRecord:Doilist\n attributes:\n id: 188\n mylist: ! \"VWQ.U9JF.45.4595.pdf\\r\\nVWQ.U9JF.45.4595.xml\\r\\n \\r\\nVWQ.U9JF.46.1558.pdf\\r\\nVWQ.U9JF.46.1558.xml\\r\\n\n \\r\\nVWQ.U9JF.421234.pdf\\r\\nVWQ.U9JF.461764.xml\\r\\n \\r\\nVWQ.U9JF.434147.pdf\"\n created_at: 2013-07-03 23:51:06.694626000 Z\n updated_at: 2013-07-03 23:51:06.694626000 Z\n myuserid: [email protected]\n mypass: mypassword\n mymonth: '7'\n mydate: '1'\n myyear: '1'\nmethod_name: :scrape\nargs: []\n"], ["last_error", nil], ["locked_at", nil], ["locked_by", nil], ["priority", 0], ["queue", nil], ["run_at", Wed, 03 Jul 2013 23:51:06 UTC +00:00], ["updated_at", Wed, 03 Jul 2013 23:51:06 UTC +00:00]]

I'd be glad to post any files upon request, as I am guessing this is a complicated issue. Thanks very much for your input.


Solution

  • I fixed this thanks to Domon's suggestion. The webpage that Mechanize was interacting with was in ISO-8859-1 format (see more about how to detect that here), whereas my system was trying to read the page in UTF-8 formatting. To fix this, I entered agent.page.encoding = 'utf-8' into my scrape method's script as illustrated in Niels Kristian's answer here. (See also denis.peplin's answer there for further clarification on where to write it.) This allowed a forced conversion of the webpage into the proper formatting (UTF-8) that my system required to read it.