Search code examples
pythonpandasdataframedata-analysis

Is there a function to split rows in the dataframe if one of the column contains more than one keyword?


My dataset contains the column "High-Level-Keyword(s)" and it contains more than one keywords separated by '\n'. I want to group the data on the basis of these Keywords.

I tried using function unique() but it treats 'Multilangant Systems', 'Multilangant Systems\nMachine Learning' and 'Machine Learning' differently.

I want the output to be like:

Multilangant - 2

Machine Learning -2

but what I'm getting is

Multilangant - 1

Machine Learning - 1

Multilangant\nMachine Learning - 1

Can you suggest some way to do the same?


Solution

  • You should .split on the separator, then count.

    from collections import Counter
    from itertools import chain
    
    Counter(chain.from_iterable(df["High-Level-Keyword(s)"].str.split('\n')))
    #Counter({'Machine Learning': 2, 'Multilangant': 2})
    

    Or make it a Series:

    import pandas as pd
    pd.Series(Counter(chain.from_iterable(df["High-Level-Keyword(s)"].str.split('\n'))))
    #Multilangant        2
    #Machine Learning    2
    #dtype: int64