I created a function that takes the entire string from any column in my dataset and extracts the email address if there is no email, it should fill the space with NaN
:
def extract_email_ID(string):
email = re.findall(r'<(.+?)>', string)
if not email:
email = list(filter(lambda y: '@' in y, string.split()))
return email[0] if email else np.nan
I used the regular expression to apply the function in the "from"
column of the dataset
dfs['from'] = dfs['from'].apply(lambda x: extract_email_ID(x))
But I am getting the following error TypeError: expected string or bytes-like object
It seems to me you have some non-string values in your example column dfs[from']
.
Perform a type check at the beginning of your function. If anything other than a string is detected, I assume you also want to return np.nan.
So maybe you could insert this:
if not isinstance(string, str):
return np.nan