Search code examples
rubyencodingutf-8centosgrape-api

Why is Ruby Base64 encoding conversion losing data and how I can route around?


I am spinning up a a Rails UI that talks to a Grape API. This is the second instance of this program. The first instance works well. The second instance's Grape API, however, appears to be corrupting data before sending it over the wire.

I need the image to go from file > json > http > db. Right now I am doing that by sending the file like so: file > string > encode to url-safe base64 > to_json > http > decode > save to sqlite3 db with ActiveRecord. I'm led to believe the image data is corrupted by my converting it to base64 based on the below. However, since the Grape is all JSON, the characters must be encoded before sending (since, at least as far as Ruby's JSON library is concerned, invalid UTF-8 == invalid JSON).

So I either have to know:

  1. How to allow Grape API to send non-JSON (raw file string) or
  2. How to decode the string and avoid the error message

Opening a file and converting its contents to url-safe Base64.

File.open("#{folder}/#{file_name}", "rb:UTF-8") do |image|
  file_as_string = image.read
end
 => "iVBORw0K ... # truncated for length

Things go weird right away. IRB does the expected - encodes as UTF-8.

file_as_string.encoding.name
 => "UTF-8"

BUT. The server logs ASCII-8BIT. I cannot explain this. Every file is topped with Ruby's magic UTF-8 comment. Linux $LANG is set to en_US.UTF-8.

OK, but when Base64 converts I lose the plot anyway. Even in IRB, starting with UTF-8, it down coverts. Why US-ASCII? Regardless, why is compatibility is lost?.

Base64.urlsafe_encode64(file_as_string).encoding.name
 => "US-ASCII"
Base64.urlsafe_decode64(Base64.urlsafe_encode64(file_as_string)).encoding.name
 => "ASCII-8BIT"
Base64.urlsafe_decode64(Base64.urlsafe_encode64(file_as_string)).encode("UTF-8")
Encoding::UndefinedConversionError: "\x89" from ASCII-8BIT to UTF-8
    from (irb):27:in `encode'
    from (irb):27
    from /home/me/.rvm/rubies/ruby-2.2.1/bin/irb:11:in `<main>'

Note that the error here in IRB is the same as if I a) don't base64 encode the string before Grape tries to_json and b) when I try to decode and call .save the string to a model attribute on the Rails side.

The file itself is binary (if that matters?)

$ file -bi /path/to/file.png
image/png; charset=binary

Solutions I've tried, or am unwilling to try:

Sending over the raw image.read

This is a JSON API, so Grape converts to JSON before sending the data over the wire -- meaning any response must be valid JSON, as far as I understand it. If I try to send the raw string over, the automatically-called .to_json throws the same error.

Force-encoding the results

The output is not a readable png.

Downgrading

The original instance is Ruby 1.9.2 and CentOS 6.3. The new instance is Ruby 2.2.1 and CentOS 7. I'm generally committed to moving forward, so I'd rather develop some solution, even if not backward compatible, then rollback Ruby and my OS.

Not using UTF-8

Rails's config/application.rb has the line config.encoding = "utf-8" and config/environment.rb has the lines Encoding.default_external = Encoding::UTF_8; Encoding.default_internal = Encoding::UTF_8 I hope not to have to give up UTF-8 compatibility just for this one issue.


So is there a way to serve a file directly in Grape, bypassing the to_json call? Or is there a different encoding safe for JSON-ing and sending over http?


Solution

  • PNG files do not have character encoding. You should open the file without declaring the character encoding. You do not need to concern yourself with character sets even after base64 encoding.

    Once the file is base64 encoded, the result is 7bit ASCII string, hence encoding.name reports "US-ASCII". This is the string you should pass to your framework,

    Do not call .encode() on the string before base64 encoding - this will surely corrupt the string.

    To clarify:

    1. file_as_string is neither UTF-8, nor ASCII. It has no character encoding as it's binary file. file_as_string.encoding.name is irrelevant to you.
    2. Base64.urlsafe_encode64(file_as_string).encoding.name = "US-ASCII" is correct as you've effectively made a binary file into a text/character string by encoding it to base64. This does have character encoding - 7bit ASCII. This is what you should be passing to Grape to put on the wire.
    3. Base64.urlsafe_decode64(Base64.urlsafe_encode64(file_as_string)).encoding.name is irrelevant as the result is a binary string again. It has no character encoding. Trying to .encode() this will corrupt the data.
    4. Your IRB fails because you're asking Ruby to covert a binary string to UTF-8 text encoding. That's like taking a picture and asking to convert it to French.