Search code examples
ruby-on-rails-3utf-8truncatemultibyte-characters

Rails truncate UTF-8 strings containing é (for example)


I am working on a rails 3.1 app with ruby 1.9.3 and mongoid as my ORM. I am facing an annoying issue. I would like to truncate the content of a post like this:

<%= raw truncate(strip_tags(post.content), :length => 200) %>

I am using raw and strip_tags because my post.content is actually handled with a rich text editor.

I have a serious issue with non ASCII characters. Imagine my post content is the following:

éééé éééé éééé éééé éééé éééé éééé éééé

What I am doing above in a naive way does this:

éééé éééé éééé éééé éééé &eac... 

Looks like truncate is seeing every word of the string like &eacute;&eactute;&eacute;&eacute;.

Is there a way to either:

  1. Have truncate handle an actual UTF-8 strings, where 'é' stands for a single character ? That would be my favorite approach.
  2. Hack the above instruction such that the result is better, like force rails to truncate between 2 words,

I am asking this question because I have not found any solution so far. This is the only place in my app where I have problems with such character, and it is a major issues since the whole content of the website is in french, so contains a lot of é, ç, à, ù.

Also, I think this behavior is quite unfortunate for the truncate helper because in my case it does not truncate 200 characters at all, but approximately 25 characters !


Solution

  • Probably too late to help with your issue, but... You can use the ActiveSupport::Multibyte::Chars limit method, like so:

    post.content.mb_chars.limit(200).to_s
    

    see http://api.rubyonrails.org/v3.1.1/classes/ActiveSupport/Multibyte/Chars.html#method-i-limit

    I was having a very similar issue (truncating strings in different languages) and this worked for my case. This is after making sure the encoding is set to UTF-8 everywhere: rails config, database config and/or database table definitions, and any html templates.