import pandas as pd
corpus = pd.DataFrame([[1, 'A B C A D B A'], [2, 'B A B B C B A']], columns=['id',
'sequence'])
corpus
Expected Output
A B C D
1 3 2 1 1
2 2 4 1 0
I have a dataframe that looks like above. I need to count co-occurrence of each character.
Try with split
then explode
and str.get_dummies
out = corpus.set_index('id').sequence.str.split(' ').explode().str.get_dummies().groupby(level=0).sum()
A B C D
1 3 2 1 1
2 2 4 1 0