Search code examples
pythondatasetkeyerror

KeyError: 'L194' Python


Here is a screenshot of the file I am working with. It contains 'L194'. enter image description here

Here's the full line, just to show that after splitting, it only has 5 elements, like the other sentences.

L194 +++$+++ u0 +++$+++ m0 +++$+++ BIANCA +++$+++ Can we make this quick? Roxanne Korrine and Andrew Barrett are having an incredibly horrendous public break- up on the quad. Again.

movie_lines = open('movie_lines.txt', mode = 'r', encoding = 'utf-8', errors = 'ignore').read().split('\n')
movie_convo_lines = open('movie_conversations.txt', mode ='r', encoding = 'utf-8', errors='ignore').read().split('\n')

map_line_id_with_text = {}

for line in movie_lines:
    extract = line.split('+++$+++')
    if len(extract)== 5:
        map_line_id_with_text[extract[0]] = extract[4]

list_of_lineids = []
for line in movie_convo_lines[:-1]:
    extract = line.split(' +++$+++ ')[-1][1:-1].replace("'","").replace(" ","")
    list_of_lineids.append(extract.split(','))


prompts = []
responses = [] 


for _ in list_of_lineids:
    for i in range(len(_)-1):
        prompts.append(map_line_id_with_text[_[i]])
        responses.append(map_line_id_with_text[_[i+1]])

limit = 0 
for i in range (limit, limit+5):
    print(prompts[i])
    print(responses[i])

When I run this code, I keep getting the error above, but the file that I have opened does contain 'L194' so I am confused as to why it is not working. The error is shown in the prompts.append command.

enter image description here


Solution

  • The delimiter for your movie lines should be ' +++$+++ ' (with spaces), which you correctly used in your second for loop, but in your first for loop you used '+++$+++' (without spaces) as a delimiter instead, so the line numbers extracted there will have a trailing space, causing the correct line numbers from list_of_lineids not to be found in map_line_id_with_text.