Search code examples
pythonlistdictionarynlp

Creating a nested dictionary in Python


I have a dictionary in Python and it currently looks like this:

{'apple': ['file1.txt', 'file2.txt', 'file3.txt'], 'banana': ['file1.txt', 'file2.txt'],
'carrot': ['file3.txt'],.....................................}

I have the contents of each file stored in a list of lists that contains the words from that file and also a general list of the files used:

[['hello', 'apple', 'test', 'banana'], ['weird', 'apple', 'tester', 'banana', 'apple'],........]]

['file1.txt', 'file2.txt', .....]

I would now like to create a new nested dictionary that contains all the information from the previous one but also the position in which the term appears in each document (if it exists in that document).

For example: I'd want print(dictionary['apple']) to return [{'file1.txt': [1]}, {'file2.txt': [1,4]},...... ] (it tells me the document it appears in AND its position in that document)

My existing code for creating the dictionary I already have is:


dict = {}
for i in range(len(textfile_list)): #list of textfiles used
    check = file_contents  #contents of file in form [['word1',..],['word2','wordn',...]]
    for item in words:#a list of every word from every file ['word1','wordn','word3',...]
  
        if item in check:
            if item not in dict:
                dict[item] = []
  
            if item in dict:
                dict[item].append(textfile_list[i])

dict = {k: list(set(v)) for k, v in dict.items()}

How would I do this??


Solution

  • I could organize your workflow like the following. Use this as a source of inspiration:

    content = [['hello', 'apple', 'test', 'banana'], ['weird', 'apple', 'tester', 'banana', 'banana', 'apple']]
    files = ['file1.txt', 'file2.txt']
    index = {k:v for k, v in zip(files, content)}
    words = set([word for words in index.values() for word in words])
    expected_dict = {}
    for word in words:
        expected_dict[word]=[]
        for key, value in index.items():
            if word in value:
                expected_dict[word].append({key:[idx for idx in range(len(value)) if value[idx]==word]})
    

    output:

    {'test': [{'file1.txt': [2]}],
     'apple': [{'file1.txt': [1]}, {'file2.txt': [1, 5]}],
     'banana': [{'file1.txt': [3]}, {'file2.txt': [3, 4]}],
     'tester': [{'file2.txt': [2]}],
     'hello': [{'file1.txt': [0]}],
     'weird': [{'file2.txt': [0]}]}