Search code examples
utf-8sidekiqrails-activejob

sidekiq active_job perform_later with text that includes UTF-8 characters


I just found this bug where I'm calling

MyJob.perform_later(request.body.read)

with a sidekiq active_job job,

the call request.body.read returns some json, I figured that in some cases it might contain chars that are UTF-8 (i.e. € symbol),

in this case I'm getting

Encoding::UndefinedConversionError Exception: "\xE2" from ASCII-8BIT to UTF-8

I'm aware that sidekiq advises not to have complex or long job parameters, what would be a best practice workaround?

what I can think of is to base64 encode the string before passing it to the job (but this would make it even longer for sidekiq, I'm not sure this would be a problem) or store the actual json text in a db table, and just pass to the job the id of the new row. this would definitely work, but looks like an overkill to me.

any suggestions?


Solution

  • Sidekiq is going to use JSON.generate to serialize the job arguments. This is an example of what is happening to your ASCII-8BIT string that you can run in the console:

    arg = "Example with € character".force_encoding('ASCII-8BIT')
    JSON.generate([arg])
    Encoding::UndefinedConversionError ("\xE2" from ASCII-8BIT to UTF-8)
    

    One option would be to follow this answer and force the encoding to UTF-8 before you pass it into perform_later. Then it will serialize correctly:

    arg = "Example with € character".force_encoding('ASCII-8BIT')
    arg.force_encoding('UTF-8')
    JSON.generate([arg])
     => "[\"Example with € character\"]"
    

    So you'd want something like:

    MyJob.perform_later(request.body.read.force_encoding('UTF-8'))