Search code examples
rubyencodingnet-http

Is the Net::HTTP Ruby gem ignoring the Content-type header in my HTTP responses?


When using the Net::HTTP class (Module?), I seem to have a problem that even though the response sets the Content-Type header to have charset equal to ISO-8859-1, the response's encoding is ASCII-8BIT.

I am not 100% sure why these two encodings are different, or how they are different but what I do know is that only the ISO-8859-1 encoding will let me do a transcoding into UTF-8. To wit:

require 'net/http'
 Net::HTTP.start(uri.host, uri.port) do |http|
  request = Net::HTTP::Get.new uri
  response = http.request request
end
response['Content-Type']
 => "text/html;charset=ISO-8859-1"
response.body.encoding
 => #<Encoding:ASCII-8BIT>
response.body.encode(Encoding::UTF_8)
Encoding::UndefinedConversionError: "\xE9" from ASCII-8BIT to UTF-8

What is going on here? If I force_encoding the response's body to Encoding::ISO_8859_1, then the transcoding works.

Is Net::HTTP at fault?


Solution

  • Ruby does not set the encoding of the response automatically (see ticket) and will always set the encoding to ASCII-8BIT.

    That is a slightly misleading encoding name since it actually means "arbitrary binary data". This is why you need to use force_encoding to set the encoding before you can transcode to other encodings.