Search code examples
ruby-on-railshttpruby-on-rails-4utf-8thinking-sphinx

UTF-8 Characters handled differently on production environment


On my Local machine I can search for "Härtefälle" which will result in the following URL:

Development

http://myapp.dev/de/incoming?q=H%E4rtef%E4llen

I can submit as many times as I want, it always looks correct:

correct

Info:

Mac OSX 10.9.5
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
thinking-sphinx (3.1.1)
rails (4.0.4)
/usr/local/Cellar/sphinx/2.2.4

locale command:

LANG=
LC_COLLATE="C"
LC_CTYPE="UTF-8"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

Production

However on my production environment when I enter the search term and click "Apply", I get the following result:

weird1

curiously when I keep pressing Apply, the term gets bigger and weirder, but somehow the search engine is still able to see the term "Härtefällen" behind this weird HÃÂâ¬rtefÃÂâ¬llenbecause the corresponding search result is displayed:

weird2

weird3

Info:

Debian 7.0
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
rails (4.0.4)
thinking-sphinx (3.1.1)
Package: sphinxsearch Version: 2.0.4-1.1

locale command:

LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

Bottomline

The only thing I do in my controller is unescaping the search params H%E4rtef%E4llen:

# TODO: Somehow `René` turns into `Ren\xE4`
params[:q] = params[:q].encode('UTF-8', 'ISO-8859-15') rescue nil

Now how do I get the sane behaviour on production? Please let me know if I can provide any more relevant information.


Solution

  • I figured out what I was doing wrong:

    1. I have a form which I use to POST data to the server
    2. The server redirects to a new URL with GET Parameters
    3. In Step 1 the characters are properly encoded, but for step 2 where I form a new URL I need to escape the URL using URI.encode:

      URI.encode(myURL)

    So that e.g. ö turns into %C3%B6