I have a list with repeating values which have been interchanged. for example
dataList=["john is student", "student is john", "john student is", "john is student", "alica is student", "good weather", "weather good"]
I want to remove all these repeating values as shown:
expected output:
dataList=["john is student","john is student", "john is student","john is student","alica is student", "good weather", "good weather"]
the code which I am trying to use is:
for i in dataList:
first=(i.split()[0]) + i.split()[1] + i.split()[2]) in studentList
........
I am stuck in forming a logic. May I know how I can get my required result
If you consider that first occurrence is correct one that you need in the final list then you can try following:
dataList= ["john is student",
"student is john",
"john student is",
"alica is student",
"good weather",
"weather good",
]
data = {}
for words in dataList:
data.setdefault(frozenset(words.split()), words)
dataList = data.values()
# dataList is you need
Edit
Since I last answer question has been updated with the requirement to keep the repetitive values.
[Answer]
dataList= ["john is student",
"student is john",
"john student is",
"alica is student",
"good weather",
"weather good",
]
class WordFrequence:
def __init__(self, word, frequence=1):
self.word = word
self.frequence = frequence
def as_list(self):
return [self.word] * self.frequence
def __repr__(self):
return "{}({}, {})".format(self.__class__.__name__, self.word, self.frequence)
counter = {}
for words in dataList:
key = frozenset(words.split())
if key in counter:
counter[key].frequence += 1
else:
counter[key] = WordFrequence(words)
dataList = [] # this is what you need
for wf in counter.values():
dataList.extend(wf.as_list())
For long input dataList
you can improve my code by replacing WordFrequence
with recordclass