Search code examples
cmatchmatchingcpu-wordansi-c

Matching words in ANSI C


How can I match a word (1-n characters) in ANSI C? (in addition: What is the pattern to match a constant in C-sourcecode?)

I tried reading the file and passing it to regexec() (regex.h). Problem: The tool I'm writing should be able to read sourcecode and find all used constants (#define) to check if they're defined.

The pattern used for testing is: [a-zA-Z_0-9]{1,}. But this would match words such as the "h" in "test.h".


Solution

  • Identifiers must start with a letter or underscore, so the pattern is

    [A-Za-z_][A-Za-z0-9_]*
    

    I know of no syntactic difference between C and preprocessor identifiers. There is a convention to use upper case for preprocessor and lowercase for C identifiers, but no actual requirement. Unless defines are guaranteed to use a distinct naming convention you would basically have to find every identifier in the source file and any included files and sort them into preprocessor identifiers, C identifiers and undeclared identifiers.

    From the GCC manual:

    Preprocessing tokens fall into five broad classes: identifiers, preprocessing numbers, string literals, punctuators, and other. An identifier is the same as an identifier in C: any sequence of letters, digits, or underscores, which begins with a letter or underscore. Keywords of C have no significance to the preprocessor; they are ordinary identifiers. You can define a macro whose name is a keyword, for instance. The only identifier which can be considered a preprocessing keyword is defined.