I have a nested containing millions of other lists (using tuples atm). For each list, an element may only be included once. I thought that each list was unique, so I needed them all, but I recently realized my nested list contained pairs like this:
listA = ('77', '131', '212', '69')
listB = ('69', '212', '131', '77')
While listA and listB are unique, one is just the reversed duplicate of the other. I need to retain every unique combination because order is important.
listC = ('131', '69', '77', '212')
So listC, while using the same elements, is considered unique because of the order and would need to be retained.
I can cut my nested list down by a huge amount (about half) if I remove all the duplicates, but I can't find a way to do this in a time efficient way.
Because it may be best to eliminate these reversed duplicates before they are even added to my nested list, below I've included the class I use to make the list.
class Graph(object):
def __init__(self, graph_dict=None):
""" Initializes a graph object.
If no dictionary or None is given,
an empty dictionary will be used. """
if graph_dict == None:
graph_dict = {}
self.__graph_dict = graph_dict
def find_all_paths(self, start_vertex, end_vertex, path=[]):
""" Find all paths from start_vertex to end_vertex in graph """
graph = self.__graph_dict
path = path + [start_vertex]
if start_vertex == end_vertex:
return [path]
if start_vertex not in graph:
return []
paths = []
for vertex in graph[start_vertex]:
if vertex not in path:
extended_paths = self.find_all_paths(vertex, end_vertex, path)
for p in extended_paths:
if len(p) >= 2:
p = tuple(p)
paths.append(p)
return paths
graph = Graph(vertexGraphDict)
nestedList= graph.find_all_paths(begin, end)
vertexGraphDict is just a dictionary of vertices as keys with the values being a list of the other vertices to which it is connected.
I have tried to eliminate the reversed duplicates using the following method:
reversedLists = []
for item in nestedLists:
if item in reversedLists:
nestedLists.remove(item)
else:
revItem = item[::-1]
reversedLists.append(revItem)
This method is very slow. I have also tried revItem = list(reversed(item)) after removing the line p = tuple(p) in my class; very slow as well. Trying these methods during the list production saves time overall but does not speed up the elimination process, which is key.
You can build an OrderedDict
with the key being the tuple in reversed order only if the last item is lower than the first item, and the value being the tuple itself, and then get the list of values of the OrderedDict
:
from collections import OrderedDict
l = [
('77', '131', '212', '69'),
('69', '212', '131', '77'),
('1', '2', '3', '4'),
('4', '1', '2', '3'),
('4', '3', '2', '1')
]
list(OrderedDict((t[::-1] if t[-1] < t[0] else t, t) for t in l).values())
Or if you're using Python 3.7 or later versions, where dict keys are ordered, you can use a dict in place of OrderedDict
:
list({t[::-1] if t[-1] < t[0] else t: t for t in l}.values())
This returns:
[('69', '212', '131', '77'), ('4', '3', '2', '1'), ('4', '1', '2', '3')]