I have a phoenix/elixir app and need to only have ASCII characters in my String. From what I tried and found here, this can only be done properly by Iconv.
:iconv.convert "utf-8", "ascii//translit", "árboles más grandes"
# arboles mas grandes
but when I run it on my mac it says:
# 'arboles m'as grandes
It seems it returns multiple letters for any character that had more than one byte in size and the order is turned around.
for example:
I'm running it with IEx 1.2.5 on Mac.
Is there any way around this, or generally a better way to achieve the same functionality as rails transliterate?
EDIT:
So here is the update rails-like behaviour according to the accepted answer on Henkik N. It does the same thing as rails parameterize( turn whatever string into sth. that you can use as a part of a url)
defmodule RailsLikeHelpers do
require Inflex
# replace accented chars with their ascii equivalents
def transliterate_string(abc) do
return :iconv.convert("utf-8", "ascii//translit", String.normalize(abc))
end
def parameterize_string(abc) do
parameterize_string(abc, "_")
end
def parameterize_string(abc,seperator) do
abc
|> String.strip
|> transliterate_string
|> Inflex.parameterize(seperator) # turns "Your Momma" into "your_momma"
|> String.replace(~r[#{Regex.escape(seperator)}{2,}],seperator) # No more than one of the separator in a row.
end
end
Running it through Unicode decomposition (as people kind of mentioned in the forum thread you linked to) seems to do it on my OS X:
iex> :iconv.convert "utf-8", "ascii//translit", String.normalize("árboles más grandes", :nfd)
"arboles mas grandes"
Decomposition means it will be normalized so that e.g. "á" is represented as two Unicode codepoints ("a" and a combining accent) as opposed to a composed form where it's a single Unicode codepoint. So I guess iconv's ASCII transliteration removes standalone accents/diacritics, but converts composed characters to things like 'a
.