Search code examples
pythonpython-3.xtextcountmovie

Counting the words a character said in a movie script


I already managed to uncover the spoken words with some help. Now I'm looking for to get the text spoken by a chosen person. So I can type in MIA and get every single words she is saying in the movie Like this:

name = input("Enter name:")
wordsspoken(script, name)
name1 = input("Enter another name:")
wordsspoken(script, name1)

So I'm able to count the words afterwards.

This is how the movie script looks like

An awkward beat. They pass a wooden SALOON -- where a WESTERN
 is being shot. Extras in COWBOY costumes drink coffee on the
 steps.
                     Revision                        25.


                   MIA (CONT'D)
      I love this stuff. Makes coming to work
      easier.

                   SEBASTIAN
      I know what you mean. I get breakfast
      five miles out of the way just to sit
      outside a jazz club.

                   MIA
      Oh yeah?

                   SEBASTIAN
      It was called Van Beek. The swing bands
      played there. Count Basie. Chick Webb.
             (then,)
      It's a samba-tapas place now.

                   MIA
      A what?

                   SEBASTIAN
      Samba-tapas. It's... Exactly. The joke's on
      history.

Solution

  • If you want to compute your tally with only one pass over the script (which I imagine could be pretty long), you could just track which character is speaking; set things up like a little state machine:

    import re
    from collections import Counter, defaultdict
    
    words_spoken = defaultdict(Counter)
    currently_speaking = 'Narrator'
    
    for line in SCRIPT.split('\n'):
        name = line.replace('(CONT\'D)', '').strip()
        if re.match('^[A-Z]+$', name):
            currently_speaking = name
        else:
            words_spoken[currently_speaking].update(line.split())
    

    You could use a more sophisticated regex to detect when the speaker changes, but this should do the trick.

    demo