Search code examples
rubyencodingutf-8rackthin

Ruby Thin/Rack strange multibyte characters behavior


The question was re-written.

I'm working on a simple web framework, and encountered a strange behavior from either Rack or the Thin server I'm using.

I tried to simplify the config.ru file as much as I could to gain the following code which reproduces the strange problem:

app = Proc.new do |env|
    content = "<p>عربي</p>"
    headers = {'Content-Type' => 'html/text; charset=utf-8', 'Content-Length' => content.length.to_s}
    [200, headers, [content]]
end

run app

The code above is a normal Rack process, with the content a HTML paragraph which contains an Arabic word of four letters. Now, running Thin server: thin start, I was waiting for the source of the web page to be:

<p>عربي</p>

While it turned to be:

<p>عربي

Only, without the closing tag. The server works correctly if I inserted an English word instead of the Arabic one, so I concluded that the problem is related to the encoding or multibyte characters of Arabic.

I'm using Ruby 1.9.2. The encoding of the file is UTF-8. And Ruby works well if I just try puts "<p>عربي</p>" in the console without the Rack or Thin server.

So, the problem is simply disappearing of a number of characters after the Arabic text when using Rack and Thin + the number of disappearing characters == the number of the Arabic characters in the text.

Any thoughts?


Solution

  • Does 'Content-Length' => content.bytesize.to_s improve things?