My dataset contains the column "High-Level-Keyword(s)" and it contains more than one keywords separated by '\n'. I want to group the data on the basis of these Keywords.
I tried using function unique() but it treats 'Multilangant Systems', 'Multilangant Systems\nMachine Learning' and 'Machine Learning' differently.
I want the output to be like:
Multilangant - 2
Machine Learning -2
but what I'm getting is
Multilangant - 1
Machine Learning - 1
Multilangant\nMachine Learning - 1
Can you suggest some way to do the same?
You should .split
on the separator, then count.
from collections import Counter
from itertools import chain
Counter(chain.from_iterable(df["High-Level-Keyword(s)"].str.split('\n')))
#Counter({'Machine Learning': 2, 'Multilangant': 2})
Or make it a Series:
import pandas as pd
pd.Series(Counter(chain.from_iterable(df["High-Level-Keyword(s)"].str.split('\n'))))
#Multilangant 2
#Machine Learning 2
#dtype: int64