Search code examples
pythonfor-loopdictionary-comprehensionordereddictionarystring-search

How do I open a list of txt files, convert them to strings, and then see if they match any keys from a given dictionary


I have a list of text files (list_4), of the format cv1.txt, cv2.txt, cv3.txt etc, up to cv20.txt. I want to use a for loop to open and read these files indivdually and convert them into strings. This is my code:

list_5 = []
for i in list_4:
    file = open(i)
    line = file.read().replace('\n', '')
    list_5.append(line)
    file.close()
print(list_5)

This part of the code works to open, read, and convert to strings my list_4 of txt files.

Now I have a dictionary called my_dict, of the format {'abandoned':-1, "abandonment':1, 'abandon':0......}

I want to use a for loop to compare the previously generated string elements from list_5 against the key pairs of my_dict, to output a series of integers for each string element of list_4.

For example:

for key in my_dict:
    for i in list_4:
        file = open(i, 'r')
        line = file.read()
        file.close()
        if key in line:
            list_6.append(my_dict[key])
print(list_6)

however the issue is that the output of this for loop is a series of jumbled keys and files:

['-1cv1.txt', '-1cv8.txt', '-1cv17.txt', '1cv4.txt', '1cv6.txt', '1cv1.txt', ...]

obtained using:

for key in my_dict:
    for i in list_4:
        file = open(i, 'r')
        line = file.read()
        file.close()
        if key in line:
            list_6.append(str(my_dict[key]) + i)
print(list_6)

Is there any way for me to get the keys specific to each string element in list_5 i.e.

list_5: ['the cow goes moo', 'the cat goes meow',...] list_6: [[0,1,-1],[0,0,0],...]

might need to use a list within a list? not sure, any help would be appreciated!


Solution

  • If I'm understanding the issue correctly, you'd like your final output to look something like this:

    [ ( 'the cow goes moo', [0, 1, -1] ), ( 'the cat goes meow', [0, 0, 0]),... ]
    

    If so, maybe try:

    for line in list_5: # using list_5 instead of list_4
        sub_list = []
        for key in my_dict:
            if key in line:
                sub_list.append(my_dict[key])
        list_6.append(sub_list)
    combined = list( zip( list_5, list_6 ))
    print( combined )
    

    (If all the line items are really whitespace delimited, the script can be sped up by splitting each line and iterating over that rather than the dictionary keys, but ignoring that for now...) Hopefully, this helps.