I am spinning up a a Rails UI that talks to a Grape API. This is the second instance of this program. The first instance works well. The second instance's Grape API, however, appears to be corrupting data before sending it over the wire.
I need the image to go from file > json > http > db. Right now I am doing that by sending the file like so: file > string > encode to url-safe base64 > to_json > http > decode > save to sqlite3 db with ActiveRecord. I'm led to believe the image data is corrupted by my converting it to base64 based on the below. However, since the Grape is all JSON, the characters must be encoded before sending (since, at least as far as Ruby's JSON library is concerned, invalid UTF-8 == invalid JSON).
So I either have to know:
Opening a file and converting its contents to url-safe Base64.
File.open("#{folder}/#{file_name}", "rb:UTF-8") do |image|
file_as_string = image.read
end
=> "iVBORw0K ... # truncated for length
Things go weird right away. IRB does the expected - encodes as UTF-8.
file_as_string.encoding.name
=> "UTF-8"
BUT. The server logs ASCII-8BIT
. I cannot explain this. Every file is topped with Ruby's magic UTF-8 comment. Linux $LANG
is set to en_US.UTF-8
.
OK, but when Base64 converts I lose the plot anyway. Even in IRB, starting with UTF-8, it down coverts. Why US-ASCII? Regardless, why is compatibility is lost?.
Base64.urlsafe_encode64(file_as_string).encoding.name
=> "US-ASCII"
Base64.urlsafe_decode64(Base64.urlsafe_encode64(file_as_string)).encoding.name
=> "ASCII-8BIT"
Base64.urlsafe_decode64(Base64.urlsafe_encode64(file_as_string)).encode("UTF-8")
Encoding::UndefinedConversionError: "\x89" from ASCII-8BIT to UTF-8
from (irb):27:in `encode'
from (irb):27
from /home/me/.rvm/rubies/ruby-2.2.1/bin/irb:11:in `<main>'
Note that the error here in IRB is the same as if I a) don't base64 encode the string before Grape tries to_json and b) when I try to decode and call .save
the string to a model attribute on the Rails side.
The file itself is binary (if that matters?)
$ file -bi /path/to/file.png
image/png; charset=binary
Solutions I've tried, or am unwilling to try:
Sending over the raw image.read
This is a JSON API, so Grape converts to JSON before sending the data over the wire -- meaning any response must be valid JSON, as far as I understand it. If I try to send the raw string over, the automatically-called .to_json
throws the same error.
Force-encoding the results
The output is not a readable png.
Downgrading
The original instance is Ruby 1.9.2 and CentOS 6.3. The new instance is Ruby 2.2.1 and CentOS 7. I'm generally committed to moving forward, so I'd rather develop some solution, even if not backward compatible, then rollback Ruby and my OS.
Not using UTF-8
Rails's config/application.rb
has the line config.encoding = "utf-8"
and config/environment.rb
has the lines Encoding.default_external = Encoding::UTF_8; Encoding.default_internal = Encoding::UTF_8
I hope not to have to give up UTF-8 compatibility just for this one issue.
So is there a way to serve a file directly in Grape, bypassing the to_json
call? Or is there a different encoding safe for JSON-ing and sending over http?
PNG files do not have character encoding. You should open the file without declaring the character encoding. You do not need to concern yourself with character sets even after base64 encoding.
Once the file is base64 encoded, the result is 7bit ASCII string, hence encoding.name
reports "US-ASCII"
. This is the string you should pass to your framework,
Do not call .encode()
on the string before base64 encoding - this will surely corrupt the string.
To clarify:
file_as_string
is neither UTF-8, nor ASCII. It has no character encoding as it's binary file. file_as_string.encoding.name
is irrelevant to you.Base64.urlsafe_encode64(file_as_string).encoding.name = "US-ASCII"
is correct as you've effectively made a binary file into a text/character string by encoding it to base64. This does have character encoding - 7bit ASCII. This is what you should be passing to Grape to put on the wire.Base64.urlsafe_decode64(Base64.urlsafe_encode64(file_as_string)).encoding.name
is irrelevant as the result is a binary string again. It has no character encoding. Trying to .encode()
this will corrupt the data.