Rhyme Dictionary from CMU pronunciation database

I'm looking for a free or open source rhyming database.

I've found the CMU pronunciation "database" and its series of apps but I can't make sense of them or figure out where the data's coming from.

A simple text file with the word and its phonemes is all I need.

Does anybody here know where I'd find one or where I would begin to derive such a list from the CMU files?

Solution

cmudict

The cmudict is a text file and it's format is really simple. First, the word is listed. Then, there are two spaces. Everything following the two spaces is the pronunciation. Where a word may have two different ways of being spoken you will see two entries for the word like

word
word(1)

At the beginning of the file they've listed symbols and punctuation. The symbol is followed by the english spelling of said symbols name with no space between them. This is then followed by the two space divider and the arpabet code. Since you're only looking for rhymes you don't have to do anything special with the symbols section since you're never going to be looking for a rhyme to ...ELLIPSIS

ARPAbet

The information about how ARPAbet codes map to IPA is listed in wikipedia http://en.wikipedia.org/wiki/Arpabet and each mapping shows example words. It's pretty easy to see how the two relate to one another and that may help you to understand how to read the ARPAbet codes if you are familiar with IPA.

Summary

Basically, if you've already found the cmudict then you've already got what you asked for: a database of words and their pronunciations. To find words that rhyme you'll have to parse the flat file into a table and run a query to find words that end with the same ARPAbet code.

General Theory of Doing Stuff to Things

Part: Stuff

create a new database
create a table in the database with three fields: index, word, arpabet
read the cmudict file line by line
for each line split it into two parts where two consecutive spaces are found AND
increment the index count, then insert the index number, word, and arpabet code

Then Umm...

Once you've got the data into whatever kind of database you choose, you can then use that database to find correlations between the arpabet codes. You could find rhymes, consonance, assonance, and other mnemonic devices. It would go something like

Part: Thing

get a word you want to find a rhyme for
query the database for the arpabet equivalent of the word
split the arpabet code into pieces by breaking it up everywhere there is a space
take the last piece of the code and, query the database for words whose arpabet codes end matches said piece
Do fancy things with the rhymes

Shortcuts and Spoilers

I got bored and wrote a Node.js module that covers "Part: Stuff" listed above. If you've got Node.js installed on your machine you can get the module by running npm install cmudict-to-sqlite See https://npmjs.org/package/cmudict-to-sqlite for the README or just look in the module for docs.