I have a dataset with a column containing text as follows
Column1
----------------------------------------------------------
dapagliflozin 10 MG / metFORMIN hydrochloride
dapagliflozin 5 MG / metFORMIN hydrochloride
Fortamet
Glucophage
Glumetza
metFORMIN hydrochloride
metFORMIN hydrochloride / pioglitazone 15 MG
metFORMIN hydrochloride / pioglitazone 30 MG
I am trying to obtain the word count for every unique word, for example, word count for metFormin, word count for hydrochloride, etc. I need help; I tried table function, but it uses the whole row as one word and that's not helpful.
We can use a combination of strsplit/unlist/table
. Split the column strings with strsplit
specifying the split
as space (\\s+
). The output will be a list
. Use unlist
to change the list to vector and then use table
to get the count.
table(unlist(strsplit(yourdf$Column1, '\\s+'))