Search code examples
ruby-on-railsrubyencodingspecial-characterscarrierwave

Weird Characters encoding


I have a weird behaviour in my params whichare passed as utf-8 but the special characters are not well managed. Instead of 1 special character, I have 2 characters: the normal letter + the accent.

Parameters: {"name"=>"Mylène.png", "_cardbiz_session"=>"be1d5b7a2f27c7c4979ac4c16fe8fc82", "authenticity_token"=>"9vmJ02DjgKYCpoBNUcWwUlpxDXA8ddcoALHXyT6wrnM=", "asset"=>{"file"=># < ActionDispatch::Http::UploadedFile:0x007f94d38d37d0 @original_filename="Mylène.png", @content_type="image/png", @headers="Content-Disposition: form-data; name=\"asset[file]\"; filename=\"Myle\xCC\x80ne.png\"\r\nContent-Type: image/png\r\n", @tempfile=# < File:/var/folders/q5/yvy_v9bn5wl_s5ccy_35qsmw0000gn/T/RackMultipart20130805-51100-1eh07dp > >}, "id"=>"copie-de-sm"}

I log this:

  • logger.debug file_name
  • logger.debug file_name.chars.map(&:to_s).inspect

Each time, same result:

  • Mylène
  • ["M", "y", "l", "e", "̀", "n", "e"]

As i try to use the filename as a matcher with already existing names properly encoded utf-8, you see my problem ;)

  • Encodings are utf-8 everywhere.
  • working under ruby 1.9.3 and rails 3.2.14.
  • Added #encoding: utf-8 in top of any file involved.

I anyone as an idea, take it !

I also published an Issue here : https://github.com/carrierwaveuploader/carrierwave/issues/1185 but not sure if its a carrierwave issue or me missing something...


Solution

  • Seems to be linked to MACOSX.

    https://www.ruby-forum.com/topic/4407424 explains it and refers to https://bugs.ruby-lang.org/issues/7267 for more details and discution.

    MACOSX decomposing special characters into utf8-mac instead of utf-8...

    While you can't know the encoding of a file name, just presupose it.

    Thanks to our Linux guy where it works properly. ;)

    file_name.encode!('utf-8', 'utf-8-mac').chars.map(&:to_s)