Search code examples
regexelixirpcre

Regex, English to Pig Latin - how to fix capitalization


Edit: the original question was off topic, editing to correct.

I'm working through some coding challenges learning Elixir and came across one to translate English into Pig Latin on Wikipedia.

I started working out regex for different rules and realized doing it all in one shouldn't be that difficult. After playing with it a little I've arrived at the following to match and convert a single word at a time.

Elixir is supposed to use PCRE compatible regex but I haven't been able to figure out a way to get \u one character to upper and \L All characters to lower to work in elixir string replacement. I've tried several variations of working those into the replacement string but am completely failing to find a way that works.

Is there a way using pure Regex in elixir String.replace it do I need to just handle the rest in regular code?

iex(21)> regex = ~r/(^(?:[aeiouAEIOU]|[XYxy][^aeiouy])(?:.*))|(?:^([A-Z][^aeiou]*(?:u)?)([aeiouy].*))|(?:^([^aeiou]*(?:u)?)([aeiouy].*))/
~r/(^(?:[aeiouAEIOU]|[XYxy][^aeiouy])(?:.*))|(?:^([A-Z][^aeiou]*(?:u)?)([aeiouy].*))|(?:^([^aeiou]*(?:u)?)([aeiouy].*))/
iex(22)> String.replace("Squirl", regex, "\\1\\u\\3\\L2\\5\\4ay")
"\\uirl\\L2ay"
iex(23)> String.replace("Squirl", regex, "\\1\\3\\2\\5\\4ay")
"irlSquay"

Original question below:

One note the challenge I'm completely says words that start with a vowel just append 'ay' to the end. Some other instructions say "way" or "yay"

Powershell version:

[Regex]$reg = '(^(?:[aeiou]|[xy][^aeiouy])(?:.*))|(?:^([^aeiou]*(?:u)?)([aeiouy].*))'
'powershell' -replace $reg, ('$1$3$2' + 'ay')

Elixir version:

regex = ~r/(^(?:[aeiou]|[xy][^aeiouy])(?:.*))|(?:^([^aeiou]*(?:u)?)([aeiouy].*))/i
String.replace("elixir", regex, "\\1\\3\\2ay")

This seems to easy, are there the cases I'm missing?


Solution

  • From the elixir regex documentation, you can see it is based on erlang's :re which clearly states:

    The matching algorithms of the library are based on the PCRE library, but not all of the PCRE library is interfaced

    And then:

    Unsupported Escape Sequences

    In Perl, the sequences \l, \L, \u, and \U are recognized by its string handler and used to modify the case of following characters. PCRE does not support these escape sequences.


    Workaround

    You have to use String.replace with "a function that receives the matched pattern and must return the replacement as a string or iodata" as the replacement (third) parameter.