Here is a screenshot of the file I am working with. It contains 'L194'.
Here's the full line, just to show that after splitting, it only has 5 elements, like the other sentences.
L194 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Can we make this quick? Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad. Again.
movie_lines = open('movie_lines.txt', mode = 'r', encoding = 'utf-8', errors = 'ignore').read().split('\n')
movie_convo_lines = open('movie_conversations.txt', mode ='r', encoding = 'utf-8', errors='ignore').read().split('\n')
map_line_id_with_text = {}
for line in movie_lines:
extract = line.split('+++$+++')
if len(extract)== 5:
map_line_id_with_text[extract[0]] = extract[4]
list_of_lineids = []
for line in movie_convo_lines[:-1]:
extract = line.split(' +++$+++ ')[-1][1:-1].replace("'","").replace(" ","")
list_of_lineids.append(extract.split(','))
prompts = []
responses = []
for _ in list_of_lineids:
for i in range(len(_)-1):
prompts.append(map_line_id_with_text[_[i]])
responses.append(map_line_id_with_text[_[i+1]])
limit = 0
for i in range (limit, limit+5):
print(prompts[i])
print(responses[i])
When I run this code, I keep getting the error above, but the file that I have opened does contain 'L194' so I am confused as to why it is not working. The error is shown in the prompts.append command.
The delimiter for your movie lines should be ' +++$+++ '
(with spaces), which you correctly used in your second for
loop, but in your first for
loop you used '+++$+++'
(without spaces) as a delimiter instead, so the line numbers extracted there will have a trailing space, causing the correct line numbers from list_of_lineids
not to be found in map_line_id_with_text
.