I am new in flex and I want to design a scanner using flex.
At this step, I want to make regular expression to match with id, but here are some conditions:
underline can exist in id
you can use _ whenever you want, but if you are using them exactly consequently it can be at most 2 underlines for example :
a_b_c »»»» true
a___b »»»» false
123abv »»»» false
integers can't be at the beginning of an id
underline can't exist at the end of an id
The regular expression I have written for that is :
(\b(_{0,2}[A-Za-z][0-9A-Za-z]*(_{0,2}[0-9A-Za-z]+)*)\b)
but now I have 2 questions:
Is the regular expression true? I have tested it in rubular.com and I think this is true but I'm not sure?
The other important problem is that when I write this in my flex file, Unfortunately no id is identified. But I can't why it is not recognized
Can anyone please help me?
The problem here is your ID
regular expression. You are using \b
to match a word boundary, but Flex's regular expressions have no built-in support for matching word boundaries. Other than that, your regular expression is sound. I was able to get your code working using this modified version of yours: _{0,2}[A-Za-z][0-9A-Za-z]*(_{0,2}[0-9A-Za-z]+)*
. (I just got rid of the \b
's, and some of the parentheses that bothered me).
Unfortunately, this causes a slight problem. Say that you're lexing and run across something like 12_345
. Flex will read 12
, assume that it found an IC
, and then read _
. Finding no match, it will print that to stdout, then read 345
as another IC
.
In order to avoid this issue (caused by Flex's lack of word boundaries), you could do one of two things:
_
in the example above.[_0-9A-Za-z]+
). If it is matched, give an error. This will cause Flex to return the entire token 12_345
as an error in the above example.One other problem: The ID
regular expression still won't match anything with underscores at the end of it. This means your current regular expression isn't perfect, and you'll need to do some tweaking with it, but now you know not to use the \b
symbol. Here is a reference on Flex's regular expression syntax so you can find other things to use/avoid.