Search code examples
pythonalgorithmcpu-wordnlplinguistics

Find words and combinations of words that can be spoken the quickest


I'm a big fan of discovering sentences that can be rapped very quickly. For example, "gotta read a little bit of Wikipedia" or "don't wanna wind up in the gutter with a bottle of malt." (George Watsky)

I wanted to write a program in Python that would enable me to find words (or combinations of words) that can be articulated such that it sounds very fast when spoken.

I initially thought that words that had a high syllable to letter ratio would be the best, but upon writing a Python program to do find those words, I retrieved only very simple words that didn't really sound fast (e.g. "iowa").

So I'm at a loss at what actually makes words sound fast. Is it the morpheme to letter ratio? Is it the number of alternating vowel-consonant pairs?

How would you guys go about devising a python program to resolve this problem?


Solution

  • This is just a stab in the dark as I'm not a linguist (although, I have written a voice synthesizer), the metric that be useful here is the number of phonemes that make up each word, since the phonemes themselves are going to be the same approximate duration regardless of use. There's an International Phonetic Alphabet chart for english dialects, as well as a nice phonology of English.

    A good open-source phonetic dictionary is available from the cmudict project which has about 130k words

    Here's a really quick stab at a look up program:

    #!/usr/bin/python
    
    import re
    
    words={}
    
    for line in open("cmudict.0.7a",'ro').readlines():
        split_idx = line.find(' ')
        words[line[0:split_idx]] = line[split_idx+1:-1]
    
    user_input = raw_input("Words: ")
    
    print
    for word in user_input.split(' '):
        try:
            print "%25s %s" % (word, words[word.upper()])
        except:
            print "%25s %s" % (word, 'unable to find phonems for word')
    

    When run..

    Words: I support hip hop from the underground up
    
                        I  AY1
                  support  S AH0 P AO1 R T
                      hip  HH IH1 P
                      hop  HH AA1 P
                     from  F R AH1 M
                      the  DH AH0
              underground  AH1 N D ER0 G R AW2 N D
                       up  AH1 P
    

    If you want to get super fancy pants about this, there's always the Python Natural Language Toolkit which may have some useful tidbits for you.

    Additionally, some real world use.. although to be fair, I fixed 'stylin' to 'styling'.. But left 'tellin' to reveal the deficiency of unknown words.. You could probably try a lookup for words ending with in' by subbing the g in for the apostrophe and then drop the NG phoneme from the lookup..

                      Yes  Y EH1 S
                      the  DH AH0
                   rhythm  R IH1 DH AH0 M
                      the  DH AH0
                    rebel  R EH1 B AH0 L
                  Without  W IH0 TH AW1 T
                        a  AH0
                    pause  P AO1 Z
                      I'm  AY1 M
                 lowering  L OW1 ER0 IH0 NG
                       my  M AY1
                    level  L EH1 V AH0 L
                      The  DH AH0
                     hard  HH AA1 R D
                   rhymer  R AY1 M ER0
                    where  W EH1 R
                      you  Y UW1
                    never  N EH1 V ER0
                     been  B IH1 N
                      I'm  AY1 M
                       in  IH0 N
                      You  Y UW1
                     want  W AA1 N T
                  styling  S T AY1 L IH0 NG
                      you  Y UW1
                     know  N OW1
                     it's  IH1 T S
                     time  T AY1 M
                    again  AH0 G EH1 N
                        D  D IY1
                      the  DH AH0
                    enemy  EH1 N AH0 M IY0
                   tellin unable to find phonems for word
                      you  Y UW1
                       to  T UW1
                     hear  HH IY1 R
                       it  IH1 T
                     They  DH EY1
                  praised  P R EY1 Z D
                  etc...
    

    If this is something you plan on putting some time into, I'd be interested in helping. I think putting 'Worlds first rapping IDE' on my resume would be hilarious. And if one exists already, world's first Python based rapping IDE. :p