Search code examples
javaregexjflex

Vowel regexp in jflex


So I did an exercise using jflex, which is about counting the amount of words from an input text file that contains more than 3 vowels. What I end up doing was defining a token for word, and then creating a java function that receives this text as input, and check each character. If its a vowel I add up the counter and then I check if its greater than 3, if it is I add up the counter of the amount of words.

What I want to know, if there is a regexp that could match a word with more than 3 vowels. I think it would be a cleaner solution. Thanks in advance.

tokens

   Letra = [a-zA-Z]
   Palabra = {Letra}+

Solution

  • Very simple. Use this if you want to check that a word contains at least 3 vowels.

    (?i)(?:[a-z]*[aeiou]){3}[a-z]*
    

    You only care it that contains at least 3 vowels, so the rest can be any alphabetical characters. The regex above can work in both String.matches and Matcher loop, since the valid word (contains at least 3 vowels) cannot be substring of an invalid word (contains less than 3 vowels).


    Out of the question, but for consonant, you can use character class intersection, which is a unique feature to Java regex [a-z&&[^aeiou]]. So if you want to check for exactly 3 vowels (for String.matches):

    (?i)(?:[a-z&&[^aeiou]]*[aeiou]){3}[a-z&&[^aeiou]]*
    

    If you are using this in Matcher loop:

    (?i)(?<![a-z])(?:[a-z&&[^aeiou]]*[aeiou]){3}[a-z&&[^aeiou]]*(?![a-z])
    

    Note that I have to use look-around to make sure that the string matched (exactly 3 vowels) is not part of an invalid string (possible when it has more than 3 vowels).