I was writing a small wrapper for nullmailer, when I noticed, imho, an unwanted behavior in grep. In particular I noticed something strange with @s.
It does break strings containing @ and will produce wrong output.
TL;DR
E-mail addresses have some rules to follow (E.G. RFC 2822), so I will use a deliberately wrong regular expression for them, just to keep things a bit shorter. Note that this will not change the problem I'm asking for.
I am using e-mail addresses in this post, but the problem is obviously for every string with at least a @ in it.
I wrote a small script to help me explain what I "found":
#!/bin/bash
funct1() {
arr=([email protected] [email protected])
regex="[[:alnum:]]*@[[:alpha:]]*\.[[:alpha:]]\{2,\}"
for dest in ${arr[@]}; do
printf "%s\n" "$dest" | grep -o -e "$regex"
done
}
funct2() {
arr=([email protected] [email protected])
regex="[[:alpha:]]*@[[:alpha:]]*\.[[:alpha:]]\{2,\}"
for dest in ${arr[@]}; do
printf "%s\n" "$dest" | grep -o -e "$regex"
done
}
funct3(){
arr=(local1@[email protected] local2@[email protected])
regex="[[:alpha:]]*@[[:alpha:]]*@[[:alpha:]]*\.[[:alpha:]]\{2,\}"
for dest in ${arr[@]}; do
printf "%s\n" "$dest" | grep -o -e "$regex"
done
}
funct4(){
arr=(local1@[email protected] local2@[email protected])
regex="[[:alpha:]]*@[[:alnum:]]*@[[:alpha:]]*\.[[:alpha:]]\{2,\}"
for dest in ${arr[@]}; do
printf "%s\n" "$dest" | grep -o -e "$regex"
done
}
printf "One @, all parts of regex right:\n"
funct1
printf "One @, first part of regex wrong:\n"
funct2
printf "Two @, first and second part of regex wrong:\n"
funct3
printf "Two @, first part of regex wrong:\n"
funct4
exit 0
To better understand the problem, I used two types of strings: [email protected]
and local1@[email protected]
and it seems to me that grep does not behave in the correct way with strings containing at least a @.
The output is:
One @, all parts of regex right:
[email protected]
[email protected]
One @, first part of regex wrong:
@domain.tld
@domain.tld
Two @, first and second part of regex wrong:
Two @, first part of regex wrong:
@[email protected]
@[email protected]
funct1
has a regular expression that solves the entire strings, so no problem, all of them are printed.
funct2
has a regular expression that solves only the strings from @ to the end, so what I should expect is no output, because of the wrong expression; instead, what I have is the second part of the strings...
That is why I decided to add the second @ in the string and do some tests.
funct3
solves only the strings from the second @ to the end, so what I should expect is no output at all because of the mistake in the regex; Ok, no output.
funct4
instead has a regular expression that solves only the strings from the first @ to the end, so what I should expect in here is that he can not show me anything; instead, what I have is the output from first @, just as funct2
.
Except for funct1
I shouldn't have any output at all, I am right?
Why does grep break the result at the first @?
I consider it an unwanted behavior because this way the result will consists in strings that don't match my expression entirely.
Am I missing something?
EDIT: deleter tag undefined-behavior
Your regex has issues, working as designed. You could also just count the number of @ as a test as well. Personally I would create a boolean method like this :
#!/bin/bash
# -- is email address valid ? --
function isEmailValid() {
echo "$1" | egrep -q "^([A-Za-z]+[A-Za-z0-9]*((\.|\-|\_)?[A-Za-z]+[A-Za-z0-9]*){1,})@(([A-Za-z]+[A-Za-z0-9]*)+((\.|\-|\_)?([A-Za-z]+[A-Za-z0-9]*)+){1,})+\.([A-Za-z]{2,})+"
}
if isEmailValid "_#@[email protected]" ;then
echo "VALID "
else
echo "INVALID"
fi
if isEmailValid "[email protected]" ;then
echo "VALID "
else
echo "INVALID"
fi
Or more simply:
function isEmailValid() {
regex="^([A-Za-z]+[A-Za-z0-9]*((\.|\-|\_)?[A-Za-z]+[A-Za-z0-9]*){1,})@(([A-Za-z]+[A-Za-z0-9]*)+((\.|\-|\_)?([A-Za-z]+[A-Za-z0-9]*)+){1,})+\.([A-Za-z]{2,})+"
[[ "${1}" =~ $regex ]]
}