Search code examples
regexpython-re

Regular expression to find words starting with no number


I need to match a string with an identifier.

Pattern

Any word will be considered as identifier if

  1. Word doesn't contain any character rather than alpha-numeric characters.
  2. Word doesn't start with number.

Input

The given input string will not contain any preceding or trailing spaces or white-space characters.

Code

I tried using the following regular expressions

  1. \D[a-zA-Z]\w*\D
  2. [ \t\n][a-zA-Z]\w*[ \t\n]
  3. ^\D[a-zA-Z]\w*$

None of them works.

How can I achieve this?

Note I want to match a string that contains multiple identifiers (also can be one). For example This is an i0dentifier 1abs, where i0dentifier, This, is, an are expected results.


Solution

  • Note that in your ^\D[a-zA-Z]\w*$ regex, \D can match non-alphanumeric chars since \D matches any non-digit chars, and \w also matches underscores, which is not an alphanumeric char.

    I suggest

    \b[A-Za-z]+[0-9][A-Za-z0-9]*\b
    

    It matches

    • \b - word boundary
    • [A-Za-z]+ - one or more letters (the identifier should start with a letter)
    • [0-9] - a digit (required)
    • [A-Za-z0-9]* - zero or more ASCII letters/digits
    • \b - word boundary.

    See the regex demo.

    In Python:

    identifiers = re.findall(r'\b[A-Za-z]+[0-9][A-Za-z0-9]*\b', text)