What I'm trying to do: Pass a column through a regex search in order to return that will be added to another column
How: By writing a function with simple if-else clauses:
def category(series):
pattern = 'microsoft|office|m365|o365'
if re.search (series,pattern,re.IGNORECASE) != None:
return 'Microsoft 365'
else:
return 'Not Microsoft 365'
df['Category'] = df['name'].apply(category)
Expected Output: A series with values set to Microsoft 365 or Not Microsoft 365
Actual Output: A series with None values
How I've solved it currently:
df[df['name'].str.contains(pattern,case = False), 'Category'] = 'Microsoft 365'
A snippet of the dataset:
name | Category |
---|---|
Microsoft | None |
M365 | None |
I am trying to understand why the apply function did not work. Any insights will be appreciated. I'm fairly new to Pandas so not 100% what's going wrong.
Thank you!
This should work:
import pandas as pd
import re
df = pd.DataFrame({
'name': ['Microsoft Exchange Pro', 'Microsoft', 'microsoft', 'office', 'Office', 'M365', 'm365', 'other'],
'Category':[None, None, None, None, None, None, None, None]
})
def category(series):
pattern = 'microsoft|office|m365|o365'
if re.search (pattern, series, re.IGNORECASE) != None:
return 'Microsoft 365'
else:
return 'Not Microsoft 365'
df['Category'] = df['name'].apply(category)
print(df)
Result:
name Category
0 Microsoft Exchange Pro Microsoft 365
1 Microsoft Microsoft 365
2 microsoft Microsoft 365
3 office Microsoft 365
4 Office Microsoft 365
5 M365 Microsoft 365
6 m365 Microsoft 365
7 other Not Microsoft 365