Write a lex program that detects and counts the word with all capital alphabet

%{
    int capital_count = 0;
%}

%%
[A-Z]+[^a-z][ \t\n]   { capital_count++; }

.       ;   // Ignore other characters

%%

int main() {
    yylex();
    printf("Number of capital words: %d\n", capital_count);
    return 0;
}

This is my code to detect the word with all capital alphabet but when the word is the type such as "tODAY","ToDAY","TOdAY" this code will be wrong.

How do I write the regular expression for detecting the word with all capital alphabet

Solution

The regular expression for uppercase words is just [A-Z]+.

The problem is that it will match also parts of words that are not entirely uppercase. To prevent such things in lex, you make a second rule that catches all words [A-Za-z]+. (It needs to be after the uppercase rule.) Lex always tries to find longest match and if it fits multiple rules at once, it chooses the earliest one, so it will match the first rule only if a longer word cannot be matched by using also lowercase characters.