Search code examples
pythondictionarysimplifydictionary-comprehension

Simplifying Python Dictionary Comprehension


I have a task where I am to create a procedure that takes in a list of strings and returns a dictionary that maps each word across all input strings to the set consisting of the string numbers of all strings in which that word appears. In the actual problem, the strings are paragraphs of text with the first index being 1.

Here is an example input->output:

L = ['a b c d e', 'a b b c c', 'd e f f']

makeInverseIndex(L) -> {'a': [1, 2], 'b': [1, 2], 'c': [1, 2], 'd': [1, 3], 'e': [1, 3], 'f': [3]}

I have two working solutions:

def makeInverseIndex(strlist): 
    InvInd = {}
    for i, d in enumerate(strlist):
        for w in d.split():
            if w not in InvInd:
                InvInd[w] = [i+1]
            elif w in d and i+1 not in InvInd[w]:
                InvInd[w].append(i+1)
    return InvInd

def makeInverseIndex2(strlist): return {x:[d+1 for d in range(len(strlist)) if x in strlist[d]]
                                            for w in strlist for x in w.split()}

My question is if the dict comprehension can be simplified in any way using enumerate. The question from the textbook hints that I should use enumerate, although I cannot figure out how to implement it.

Here is my best attempt although I am aware it is wrong due to an assignment error I.e. the w is assigned to in the list comprehension and not recognized in the line:

 for x in w.split()
def makeInverseIndex3(strlist): return {x:[i for i, w in enumerate(strlist) if x in strlist[i]]
                                             for x in w.split()}

I feel close and I'm sure the solution is probably obvious, but I just can't nut it out!

Thanks


Solution

  • Using dictionary comprehension with enumerate

    def makeInverseIndex4(strlist):
      return {x:[d+1 for d, v in enumerate(strlist) if x in v] for w in strlist for x in w.split()}
    

    Or we can use enumerate with start = 1 rather than d + 1

    def makeInverseIndex4(strlist):
          return {x:[d for d, v in enumerate(strlist, start=1) if x in v] for w in strlist for x in w.split()}
    

    Output

    {'a': [1, 2], 'b': [1, 2], 'c': [1, 2], 'd': [1, 3], 'e': [1, 3], 'f': [3]}