Search code examples
ruby-on-railsruby-on-rails-4arabichebrewfriendly-id

Rails friendly id with non-Latin characters


I have a model which I use its friendly id as slug:

extend FriendlyId
friendly_id :slug_candidates, :use => :scoped, :scope => :account

def slug_candidates
  :title_and_sequence
end


def title_and_sequence
  slug = normalize_friendly_id(title)
      :
  # some login to add sequence in case of collision
      :
end

My problem is that when I use non-Latin chars (Arab, Hebrew,...) I get an empty slug. Is there any nice-and-easy solution?


UPDATE

Just to make my question clear, I would like to have the same behaviour as WordPress, which means:

+--------------------+----------------------------------------------------+
| Title              | url                                                |
+--------------------+----------------------------------------------------+
| Hello World!!      | /hello-world                                       |
+--------------------+----------------------------------------------------+
| Helló Világ        | /hello-vilag                                       |
+--------------------+----------------------------------------------------+
| שלום עולם          | /%D7%A9%D7%9C%D7%95%D7%9D-%D7%A2%D7%95%D7%9C%D7%9D |
+--------------------+----------------------------------------------------+
| مرحبا              | %D9%85%D8%B1%D8%AD%D8%A8%D8%A7                     |
+--------------------+----------------------------------------------------+

(both Arabic and Hebrew are translated in modern browsers to original and readable characters).


Solution

  • Thanks to @michalszyndel notes and ideas I managed to get the following solution, hope it will be helpful for more people.

    First, how to make non-unicode chars in slug:

    extend FriendlyId
    friendly_id :slug_candidates, :use => :scoped, :scope => :account
    
    def slug_candidates
      :title_and_sequence
    end
    
    def title_and_sequence
      # This line switch all special chars to its unicode
      title_unicode = heb_to_unicode(title)
    
      slug = normalize_friendly_id(title_unicode)
          :
      # some login to add sequence in case of collision
      # and whatever you need from your slug
          :
    end
    
    def heb_to_unicode(str)
      heb_chars = 'אבגדהוזחטיכךלמםנןסעפףצץקרשת'
      heb_map = {}
      heb_chars.split("").each {|c| heb_map.merge!({c => URI::encode(c)})}
      # This regex replace all Hebrew letters to their unicode representation
      heb_re = Regexp.new(heb_map.keys.map { |x| Regexp.escape(x) }.join('|'))
    
      return str.gsub(heb_re, heb_map)
    end
    

    I also needed to modify normalize_friendly_id in order to avoid it to get rid of the %.
    I simply took the code of parameterize method and added % to the regex:

    def normalize_friendly_id(string)
      # replace accented chars with their ascii equivalents
      parameterized_string = I18n.transliterate(string)
    
      sep = '-'
    
      # Turn unwanted chars into the separator
      # We permit % in order to allow unicode in slug
      parameterized_string.gsub!(/[^a-zA-Z0-9\-_\%]+/, sep)
      unless sep.nil? || sep.empty?
        re_sep = Regexp.escape(sep)
        # No more than one of the separator in a row.
        parameterized_string.gsub!(/#{re_sep}{2,}/, sep)
        # Remove leading/trailing separator.
        parameterized_string.gsub!(/^#{re_sep}|#{re_sep}$/, '')
      end
      parameterized_string.downcase
    end
    

    Now if I save a model with the title שלום its slug is saved as %D7%A9%D7%9C%D7%95%D7%9D.
    In order to find the instance using the friendly method I need to do the following:

    id = URI::encode(params[:id]).downcase
    Page.friendly.find(id)