Search code examples
pythonpandasdataframenumpydata-analysis

How to extract email addresses from a string in python


I created a function that takes the entire string from any column in my dataset and extracts the email address if there is no email, it should fill the space with NaN:

def extract_email_ID(string):
    email = re.findall(r'<(.+?)>', string)
    if not email:
        email = list(filter(lambda y: '@' in y, string.split()))
    return email[0] if email else np.nan

I used the regular expression to apply the function in the "from" column of the dataset

dfs['from'] = dfs['from'].apply(lambda x: extract_email_ID(x))

But I am getting the following error TypeError: expected string or bytes-like object


Solution

  • It seems to me you have some non-string values in your example column dfs[from']. Perform a type check at the beginning of your function. If anything other than a string is detected, I assume you also want to return np.nan. So maybe you could insert this:

    if not isinstance(string, str):
        return np.nan