Search code examples
pythonregexpython-3.xlistdata-extraction

removing word interchanged elements of list in python


I have a list with repeating values which have been interchanged. for example

dataList=["john is student", "student is john", "john student is", "john is student", "alica is student", "good weather", "weather good"]

I want to remove all these repeating values as shown:

expected output:

dataList=["john is student","john is student", "john is student","john is student","alica is student", "good weather", "good weather"]

the code which I am trying to use is:

for i in dataList:
    first=(i.split()[0]) +  i.split()[1] + i.split()[2]) in studentList
    ........

I am stuck in forming a logic. May I know how I can get my required result


Solution

  • If you consider that first occurrence is correct one that you need in the final list then you can try following:

    dataList= ["john is student", 
               "student is john", 
               "john student is", 
               "alica is student", 
               "good weather", 
               "weather good",
              ]
    
    data = {}
    for words in dataList:
        data.setdefault(frozenset(words.split()), words)
    
    dataList = data.values() 
     # dataList is you need
    

    Edit

    Since I last answer question has been updated with the requirement to keep the repetitive values.

    [Answer]

    dataList= ["john is student", 
               "student is john", 
               "john student is",
               "alica is student",
               "good weather", 
               "weather good",
              ]
    
    class WordFrequence:
        def __init__(self, word, frequence=1):
            self.word = word
            self.frequence = frequence
    
        def as_list(self):
            return [self.word] * self.frequence
    
        def __repr__(self):
            return "{}({}, {})".format(self.__class__.__name__, self.word, self.frequence)    
    
    counter = {} 
    for words in dataList:
        key = frozenset(words.split())
        if key in counter:
            counter[key].frequence += 1
        else:
            counter[key] = WordFrequence(words)
    
    dataList = [] # this is what you need
    for wf in counter.values():
        dataList.extend(wf.as_list())
    

    For long input dataList you can improve my code by replacing WordFrequence with recordclass