I need to match a string with an identifier.
Any word will be considered as identifier if
The given input string will not contain any preceding or trailing spaces or white-space characters.
I tried using the following regular expressions
\D[a-zA-Z]\w*\D
[ \t\n][a-zA-Z]\w*[ \t\n]
^\D[a-zA-Z]\w*$
None of them works.
How can I achieve this?
Note I want to match a string that contains multiple identifiers (also can be one). For example This is an i0dentifier 1abs
, where i0dentifier
, This
, is
, an
are expected results.
Note that in your ^\D[a-zA-Z]\w*$
regex, \D
can match non-alphanumeric chars since \D
matches any non-digit chars, and \w
also matches underscores, which is not an alphanumeric char.
I suggest
\b[A-Za-z]+[0-9][A-Za-z0-9]*\b
It matches
\b
- word boundary[A-Za-z]+
- one or more letters (the identifier should start with a letter)[0-9]
- a digit (required)[A-Za-z0-9]*
- zero or more ASCII letters/digits\b
- word boundary.See the regex demo.
In Python:
identifiers = re.findall(r'\b[A-Za-z]+[0-9][A-Za-z0-9]*\b', text)