I have a dictionary:
{'Consulting': {'Deloitte', 'EY', 'KPMG', 'PwC'},
'Education': {'.edu', 'College', 'University'},
'Government':{'state','.gov','city'},
'Corporate':{'corpor','consumer','care'},
...... etc.}
I have a dataframe:
Sno Text column1 column2 ......
1 Deloitte.com
2 Texas.gov
3 smi@EY.com
4 UTD.edu
5 rapper@corporate.com
..... etc.
I want to use the dictionary to categorize the dataframe and build a column Category, like this:
Sno Text Category column1 column2 ......
1 Deloitte.com Consulting
2 Texas.gov Government
3 smi@EY.com Consulting
4 UTD.edu Education
5 rapper@corporate.com Corporate
..... etc.
How can I utilize the dictionary with multiple values in python to find a full phrase or part of the phrase in the Text column and categorize it? Can we also use the same logic in case 2 matches exist? What will happen then?
Also, might sound vague, but the reason I am using Dictionary is that we can map multiple values to one category, is there a better way to do it without the dictionary?
IIUC after re-create your dict
do with findall
, then map it back
newdict = {i: k for k, v in d.items() for i in v}
df.Text.str.findall('|'.join(newdict.keys())).str[0].map(newdict)
Out[431]:
0 Consulting
1 Government
2 Consulting
3 Education
4 Corporate
Name: Text, dtype: object
df['cate']=df.Text.str.findall('|'.join(newdict.keys())).str[0].map(newdict)