Search code examples
regexbashemail-address

What should I use in bash script to extract email addresses from noisy lines in file?


I have a file that has one email address per line. Some of them are noisy, i.e. contain junk characters before and/or after the address, e.g.

[email protected]<mailto
<[email protected]>
<[email protected]>Mobile
<[email protected]>
<[email protected]
[email protected]

How can I extract the right address from each line of the file in a loop like this?

for l in `cat file_of_email_addresses`
do
     # do magic here to extract address form $l
done

It looks like that if I get garbage before the address then it always ends with lt;, and if I get it after then it always starts with &amp


Solution

  • Try this with GNU grep:

    grep -Po '[\w.-]+@[\w.-]+' file
    

    Output:

    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    [email protected]
    

    It's not perfect but perhaps it is sufficient for your task.