Search code examples
flex-lexerlex

Match the input with string using lex


I'm trying to match the prefix of the string Something. For example, If input So,SOM,SomeTH,some,S, it is all accepted because they are all prefixes of Something.

My code

Ss[oO]|Ss[omOMOmoM] {
        printf("Accept Something": %s\n", yytext);
}

Input

Som

Output

Accept Something: So
Invalid Character

It's suppose to read Som because it is a prefix of Something. I don't get why my code doesn't work. Can anyone correct me on what I am doing wrong?


Solution

  • I don't know what you think the meaning of

    Ss[oO]|Ss[omOMOmoM]
    

    is, but what it matches is either:

    • an S followed by an s followed by exactly one of the letters o or O, or
    • an S followed by an s followed by exactly one of the letters o, O, m or M. Putting a symbol more than once inside a bracket expression has no effect.

    Also, I don't see how that could produce the output you report. Perhaps there was a copy-and-paste error, or perhsps you have other pattern rules.

    If you want to match prefixes, use nested optional matches:

    s(o(m(e(t(h(i(ng?)?)?)?)?)?)?)?
    

    If you want case-insensitive matcges, you could write out all the character classes, but that gets tiriesome; simpler is to use a case-insensitve flag:

    (?i:s(o(m(e(t(h(i(ng?)?)?)?)?)?)?)?)
    

    (?i: turns on the insensitive flag, until the matching close parenthesis.

    In practice, this is probably not what you want. Normally, you will want to recognise a complete word as a token. You could then check to see if the word is a prefix in the rule action:

    [[:alpha:]]+    { if (yyleng <= strlen("something") && 0 == strncasemp(yytext, "something", yyleng) { 
                      /* do something */
                      } 
                    }
    

    There is lots of information in the Flex manual.