How do I restrict what comes before and after a regex

I have to create a regular expression to identify emails. Here it's how it looks so far:

[A-Za-z0-9]+([._-]*[A-Za-z0-9]+)*[@]+[A-Za-z0-9]+([._-]*[A-Za-z0-9]+)*(.com)*

What I want with this regex is to identify an email. The thing is that the email can't start or finish with any non-alphanumeric symbols. So:

.ilikestack@gmail.com or ilikestack@gmail.com_ = invalid
ilike.stack@gmail = valid

But when i run the my Lex program the first two emails above are considered valid and I can't figure out how to change this.

Solution

The usual way to control what can and can't appear before and after a regex is to define another regex, or multiple ones, which match the same thing but surrounded by invalid characters.

So if we had the regex [a-z]+, but we only wanted it to match if it was preceded by only white space (or at the beginning of the file) and followed by only white space or a dot (or the end of file), we could accomplish that as follows:

[a-z]+                printf("Successful match: '%s'!\n", yytext);
[^a-z \t\r\n][a-z]+   ;
[a-z]+[^a-z \t\r\n.]  ;
.                     ;

Then the input ab cd_ ef. .de fg would produce the output:

Successful match: 'ab'!
Successful match: 'ef'!
Successful match: 'fg'!

For your use case, the simplest solution would be to have two additional rules: One for words that start with a non-email non-whitespace character and extend to the next white space character. And one that ends with a non-email character that isn't a dot (or anything else that's allowed to appear after e-mails):

[^ \t\r\nA-Za-z0-9][^ \t\r\n]*   ;
[^ \t\r\n]*[^ \t\r\nA-Za-z0-9.]  ;