Search code examples
linuxsedgrepexim

Print a part of line if operation of sed and grep is true


Trying to find spammers in exim mainlog. Mainlog has mail IDs and Subjects something like below.

[email protected] S==thi#s i $s @a Su~bJec%t
[email protected] S==thi#s i ^s an*ot+her Su~bj)ec%t

What I am trying to do is take the subject, remove all the symbols, space using sed and grep for keywords. If satisfied, then print mail ID. I am successful in removing all the symbols, space and grep the keywords, but the problem is symbols from mail IDs (@ and .) are also removed. So my question is how to apply sed and grep only to subjects S==thi#s i ^s an*ot+her Su~bj)ec%t and if satisfied print mail ID without affecting its symbols. Thanks in advance.


Solution

  • This would be tricky with sed, if even possible. If you're ok with awk instead:

    awk -F' S==' -v k1=this '{gsub("[][()#$@~% ]", "", $2); if ($2 ~ k1) print $1}'
    

    If you want to remove all non-alphanumeric characters, then it's better to write like this:

    awk -F' S==' -v k1=this '{gsub("[^[:alnum:]]", "", $2); if ($2 ~ k1) print $1}'
    

    If your version of awk doesn't support [:alnum:] then you can write like this instead:

    awk -F' S==' -v k1=this '{gsub("[^a-zA-Z0-9]", "", $2); if ($2 ~ k1) print $1}'
    

    Explanation:

    • Using S== as the field separator to split mail ID and subject parts
    • Passing in a keyword "this" in the k1 variable. You could use any other keyword or multiple keywords with more -v parameters in the same format, for example -v k2=something
    • Remove all the symbols from the 2nd field with gsub
    • If the 2nd field matches the keyword in k1, then print the first field (= the mail ID)

    I hope this helps.