I've got a df that contains the columns profession and media. I would like to calculate the correlation between those two columns.
Is there a short hack of calculating the correlation of columns of strings? Or do I have transform each profession and media to a number and then calculate the correlation with .corr()?
I found a similar question (Is there a way to get correlation with string data and a numerical value in pandas?) but I would like to check the string, not each word within the string.
df
profession media
0 media lawyer print
1 student online
2 student print
3 professor online
4 media lawyer online
You can convert datatype to categorical and then do it
df['profession']=df['profession'].astype('category').cat.codes
df['media']=df['media'].astype('category').cat.codes
df.corr()