Analyze MySQL Text Data

This is a strange one but I have found the Stackoverflow community to be very helpful. I have mySQL Table with a column full of parsed text data. I want to analyze the data and see in how many rows words appear.

ID     columnName
1      Car
2      Dog
3      CAR CAR car CAR

From the above example what I want returned is that the word CAR appears in two rows and the word Dog Appears in 1 row. I don't really care how much the word count is as much as in how many rows does the word appear in. The problem is that I don't know which words to search for. Is there a tool, or something I can build in python, that would show me the most popular words used and in how many rows do the words appear in. I have not idea where to start and it would be great if someone could assist me with this.

Solution

I'd use python:

1) setup python to work with mysql (loads of tutorials online)

2) define:

from collections import defaultdict
tokenDict = defaultdict(lambda: 0)

the former is a simple dictionary which returns 0 if there is no value with the given key (i.e. tokenDict['i_have_never_used_this_key_before'] will return 0)

3) read each row from the table, tokenize it and increment the token counts

tokens = row.split(' ') //tokenize
tokens = [lower(t) for t in tokens] //lowercase
tokens = set(tokens) //remove duplicates
for token in tokens:
    tokenDict[token] = tokenDict[token] + 1