Search code examples
regexperlcharactersubstitutionany

Perl Regex for Substituting Any Character


Essentially, I want to replace the u between the random character and the k to be an o. The output I should get from the substitution is dudok and rujok.

How can I do this in Perl? I'm very new to Perl so go easy on me.

This is what I have right now:

$text = "duduk, rujuk";
$_ = $text;
s/.uk/ok/g
print $_; #Output: duok, ruok Expected: dudok, rujok

EDIT: Forgot to mention that the last syllable is the only one that should be changed. Also, the random character is specifically supposed to be a random consonant, not just any random character.

I should mention that this is all based on Malay language rules for grapheme to phoneme conversion.


Solution

  • According to the this page, the Malayan language uses an unaccented latin alphabet, and it has the same consonants as the English language. However, its digraphs are different than English's.

    • ai vowel
    • au vowel
    • oi vowel
    • gh consonant
    • kh consonant
    • ng consonant
    • ny consonant
    • sy consonant

    So, if one wanted to find a syllable ending with uk, one would look for

    <syllable_boundary>(?:[bcdfhjlmpqrtvwxyz]|gh?|kh?|n[gv]?|sv?)uk
    

    or

    <syllable_boundary>uk
    

    The OP is specifically disinterested in the latter, so we simply need to look for

    <syllable_boundary>(?:[bcdfhjlmpqrtvwxyz]|gh?|kh?|n[gv]?|sv?)uk
    

    So now, we have to determine how to find a syllable boundary. ...or do we? All the consonant digraphs end with a consonant, and none of the vowel digraphs end in a consonant so we simply need to look for

    [bcdfghjklmnpqrstvwxyz]uk
    

    Finally, we can use \b to check for the end of the word, so we're interested in matching

    [bcdfghjklmnpqrstvwxyz]uk\b
    

    Now, let's use this in a substitution.

    s/([bcdfghjklmnpqrstvwxyz])uk\b/$1ok/g
    

    or

    s/(?<=[bcdfghjklmnpqrstvwxyz])uk\b/ok/g
    

    or

    s/[bcdfghjklmnpqrstvwxyz]\Kuk\b/ok/g
    

    The last one is the most efficient, but it requires Perl 5.10+. (That shouldn't be a problem given how ancient it is.)