Trying to find spammers in exim mainlog. Mainlog has mail IDs and Subjects something like below.
[email protected] S==thi#s i $s @a Su~bJec%t
[email protected] S==thi#s i ^s an*ot+her Su~bj)ec%t
What I am trying to do is take the subject, remove all the symbols, space using sed and grep for keywords. If satisfied, then print mail ID.
I am successful in removing all the symbols, space and grep the keywords, but the problem is symbols from mail IDs (@ and .) are also removed.
So my question is how to apply sed
and grep
only to subjects S==thi#s i ^s an*ot+her Su~bj)ec%t
and if satisfied print mail ID without affecting its symbols.
Thanks in advance.
This would be tricky with sed
, if even possible. If you're ok with awk
instead:
awk -F' S==' -v k1=this '{gsub("[][()#$@~% ]", "", $2); if ($2 ~ k1) print $1}'
If you want to remove all non-alphanumeric characters, then it's better to write like this:
awk -F' S==' -v k1=this '{gsub("[^[:alnum:]]", "", $2); if ($2 ~ k1) print $1}'
If your version of awk
doesn't support [:alnum:]
then you can write like this instead:
awk -F' S==' -v k1=this '{gsub("[^a-zA-Z0-9]", "", $2); if ($2 ~ k1) print $1}'
Explanation:
S==
as the field separator to split mail ID and subject partsk1
variable. You could use any other keyword or multiple keywords with more -v
parameters in the same format, for example -v k2=something
gsub
k1
, then print the first field (= the mail ID)I hope this helps.