I have a column called 'tags' in my pandas dataframe that is a list of multiple strings.
[abc, 123, xyz]
[456, 123]
[abc, 123, xyz]
and I have another column tech that has a single string each
win
mac
win
Please let me know if there is a way for me to get which element in tech appears most frequently for each element in tags. For example, 'abc' is most frequently associated with 'win' as compared to the other techs. So the output should look like :
abc win
123 win
xyz win
456 mac
IIUC , you can explode
the Tags
column and use crosstab
with idxmax
:
Input:
d = {'Tags':[['abc', 123, 'xyz'],[456, 123],['abc', 123, 'xyz']],
'tech':['win','mac','win']}
df = pd.DataFrame(d)
print(df)
Tags tech
0 [abc, 123, xyz] win
1 [456, 123] mac
2 [abc, 123, xyz] win
Solution:
m = df.explode('Tags')
out = pd.crosstab(m['Tags'],m['tech']).idxmax(1)
Tags
123 win
456 mac
abc win
xyz win
dtype: object