Search code examples
pythonmachine-learninginformation-retrievalimdbinverted-index

How to make term document in python


I have 16000 record from imdb dataset like this

Movie_Name         Synops 
Alien Predator     ['great','17th', 'abigail', 'by', 'century', 'is']
Shark Exorcist     ['demonic', 'devil', 'great', 'hell', 'holy', 'nun']
Jurassic Shark     ['abandoned', 'an', 'and', 'beautiful', 'abigail',]

i don't know how to make term document for each word in Synops column like this

"great": Alien Predator,Shark Exorcist
"17th"  :Alien Predator
"abigail":Alien Predator,Jurassic Shark
.....

Solution

  • data = {
        "Alien Predator": ['great','17th', 'abigail', 'by', 'century', 'is'],
        "Shark Exorcist": ['demonic', 'devil', 'great', 'hell', 'holy', 'nun'],
        "Jurassic Shark": ['abandoned', 'an', 'and', 'beautiful', 'abigail',]
    }
    
    result = {}
    for movie_name, keywords in data.items():
        for keyword in keywords:
            result.setdefault(keyword, []).append(movie_name)
    print(result)
    

    Result (newlines added for clarity):

    {
    'great': ['Alien Predator', 'Shark Exorcist'], 
    '17th': ['Alien Predator'], 
    'abigail': ['Alien Predator', 'Jurassic Shark'], 
    'by': ['Alien Predator'], 
    'century': ['Alien Predator'], 
    'is': ['Alien Predator'], 
    'demonic': ['Shark Exorcist'], 
    'devil': ['Shark Exorcist'], 
    'hell': ['Shark Exorcist'], 
    'holy': ['Shark Exorcist'], 
    'nun': ['Shark Exorcist'], 
    'abandoned': ['Jurassic Shark'], 
    'an': ['Jurassic Shark'],
    'and': ['Jurassic Shark'], 
    'beautiful': ['Jurassic Shark']
    }