I already managed to uncover the spoken words with some help. Now I'm looking for to get the text spoken by a chosen person. So I can type in MIA and get every single words she is saying in the movie Like this:
name = input("Enter name:")
wordsspoken(script, name)
name1 = input("Enter another name:")
wordsspoken(script, name1)
So I'm able to count the words afterwards.
This is how the movie script looks like
An awkward beat. They pass a wooden SALOON -- where a WESTERN
is being shot. Extras in COWBOY costumes drink coffee on the
steps.
Revision 25.
MIA (CONT'D)
I love this stuff. Makes coming to work
easier.
SEBASTIAN
I know what you mean. I get breakfast
five miles out of the way just to sit
outside a jazz club.
MIA
Oh yeah?
SEBASTIAN
It was called Van Beek. The swing bands
played there. Count Basie. Chick Webb.
(then,)
It's a samba-tapas place now.
MIA
A what?
SEBASTIAN
Samba-tapas. It's... Exactly. The joke's on
history.
If you want to compute your tally with only one pass over the script (which I imagine could be pretty long), you could just track which character is speaking; set things up like a little state machine:
import re
from collections import Counter, defaultdict
words_spoken = defaultdict(Counter)
currently_speaking = 'Narrator'
for line in SCRIPT.split('\n'):
name = line.replace('(CONT\'D)', '').strip()
if re.match('^[A-Z]+$', name):
currently_speaking = name
else:
words_spoken[currently_speaking].update(line.split())
You could use a more sophisticated regex to detect when the speaker changes, but this should do the trick.