Search code examples
regexperlregex-lookarounds

Perl word disenvoweling: removing all vowels except the first and last


In order to shorten a number of names, but still keep them somewhat readable, I would like to remove all vowels from strings, except for the first and last occurrence. For example, I'd like 'Minnesota' to become 'Minnsta'.

my $name="Minnesota";

I've tried to make use of Perl's zero-width look-behind regex syntax, like so:

$name =~ s/(?<=[aeiou])([^aeiou]*)[aeiou]/$1/ig; # minnst

However, although this properly takes care of the first vowel, it removes the last one. To fix this, I tried to keep the last vowel in place, like this:

$name =~ s/(?<=[aeiou])([^aeiou]*)([aeiou])([aeiou][^aeiou]*)$/$1$3/ig; # minnesota

This also didn't work, presumably because the '$' anchors the whole regex to the end of the string.

Of course, I could look up the position of the first vowel, reverse the rest of the string and remove all vowels except for the 'first' (last), and re-reverse and concatenate the strings, but that's not very elegant. I feel I'm overlooking one of the options of the zero-width syntax.


Solution

  • Just specify a ending boundary condition for your regex: (?![^aeiou]*$):

    use strict;
    use warnings;
    
    my @words = qw(Minnesota concatenate strings elegant I feel overlooking options syntax any greatly appreciated);
    
    for (@words) {
        my $word = $_;
    
        $word =~ s/(?<=[aeiou])([^aeiou]*)[aeiou](?![^aeiou]*$)/$1/ig;
    
        printf "%-12s -> %s\n", $_, $word;
    }
    

    Outputs:

    Minnesota    -> Minnsta
    concatenate  -> conctnte
    strings      -> strings
    elegant      -> elgant
    I            -> I
    feel         -> feel
    overlooking  -> ovrlking
    options      -> optons
    syntax       -> syntax
    any          -> any
    greatly      -> greatly
    appreciated  -> apprcted