Search code examples
pythonnlpexpressionregular-language

Explain regular expression in Python


Can you help me to understand this code line by line please?

def word(w)
    pattern = re.compile(r'^[^aeiouAEIOU]+')

    if re.findall(r'[^aeiouAEIOU]y[^aeiouAEIOU]', w):
        pattern = re.compile(r'^[^aeiouAEIOUy]+')
        beginning = re.findall(pattern, word)
        w = pattern.sub('', w)
        w += str(beginning[0]) + 'ay'
        return w

For me this part is confusing : [^aeiouAEIOU]y[^aeiouAEIOU]

Thanks!


Solution

  • For the regular expression [^aeiouAEIOU]y[^aeiouAEIOU] we can break it down into:

    • [^aeiouAEIOU] - not a vowel
    • y - the letter 'y'
    • [^aeiouAEIOU] - not a vowel

    Specifically, [aeiou] would be a set of all lowercase vowels, so that matches on one character of "aeiou". The caret ^ means not, so [^aeiou] would match on any character other than a lower case vowel.

    Therefore the regex matches the letter "y" with any character directly before and after it that is not a vowel.