For an application I'm working on, I'm searching a directory of files, and expecting to find matching pairs of files to perform some further analysis on.
In this case, a pair is defined as matching on some subset of attributes, but differing in some other attributes.
As part of the error handling/warning, I want to identify any files found that are "incomparable," i.e. files for which the expected "partner" in the pair is not found.
I have a class of objects to store the structured attribute information, and when I read files in the directory, I store each file I find as an element in list of these objects.
Here's a silly simple example
class glove(object):
def __init__(self, size, color, is_right):
self.size = size
self.color = color
self.is_right = is_right
def __repr__(self):
if self.is_right:
hand = "right"
else:
hand = "left"
s = "{} {} {}".format(self.size, self.color, hand)
return(s)
gloves = [glove('med', 'black', False),
glove('med', 'black', True),
glove('lg', 'black', False),
glove('lg', 'black', True),
glove('med', 'brown', False),
glove('med', 'brown', True),
glove('lg', 'blue', False),
glove('med', 'tan', False)]
left_gloves = [x for x in gloves if not x.is_right]
right_gloves = [x for x in gloves if x.is_right]
Let's assume that there's no duplicate elements in the list, and let's define a "pair" as two glove
objects that have matching glove.size
and glove.color
but different values of glove.is_right
(i.e. one is Right and one is Left).
Now I'd like to identify incomplete pairs (perhaps into a list of leftovers
so that I could error or warn appropriately, e.g. "No Left lg blue glove found" "No Left med tan glove found."
I've seen answers that teach how to identify items "missing" from pairs of lists, but my application has a couple of complexities that I couldn't figure out how to address: linking on attributes of an object, and linking on multiple attributes of an object.
I imagine something is possible with for loops and list comprehension, but I can't quite figure out how to link it all together.
It's pretty easy if you can implement equality/hash for your class:
class glove(object):
def __init__(self, size, color, is_right):
self.size = size
self.color = color
self.is_right = is_right
def __repr__(self):
if self.is_right:
hand = "right"
else:
hand = "left"
s = "{} {} {}".format(self.size, self.color, hand)
return(s)
def __eq__(self, other):
return isinstance(other, glove) and \
other.size == self.size and \
other.color == self.color \
and other.is_right == self.is_right
def __hash__(self):
return hash((self.size, self.color, self.is_right))
gloves = [glove('med', 'black', False),
glove('med', 'black', True),
glove('lg', 'black', False),
glove('lg', 'black', True),
glove('med', 'brown', False),
glove('med', 'brown', True),
glove('lg', 'blue', False),
glove('med', 'tan', False)]
gloves_set = set(gloves)
unpaired = [g for g in gloves if glove(g.size, g.color, not g.is_right) not in gloves_set]
print(unpaired)
Output:
[lg blue left, med tan left]
You can also consider using namedtuple
, which actually does these for you.
Here is an alternative that does not require implementing equals and hash, nor creating new objects:
class glove(object):
def __init__(self, size, color, is_right):
self.size = size
self.color = color
self.is_right = is_right
def __repr__(self):
if self.is_right:
hand = "right"
else:
hand = "left"
s = "{} {} {}".format(self.size, self.color, hand)
return(s)
gloves = [glove('med', 'black', False),
glove('med', 'black', True),
glove('lg', 'black', False),
glove('lg', 'black', True),
glove('med', 'brown', False),
glove('med', 'brown', True),
glove('lg', 'blue', False),
glove('med', 'tan', False)]
# With plain dict
glove_search = {}
for g in gloves:
glove_search.setdefault(g.size, {}).setdefault(g.color, {})[g.is_right] = True
unpaired = [g for g in gloves
if not glove_search.get(g.size, {}).get(g.color, {}).get(not g.is_right, False)]
# Or, more idiomatically, with defaultdict
from collections import defaultdict
glove_search = defaultdict(lambda: defaultdict(lambda: defaultdict(bool)))
for g in gloves:
glove_search[g.size][g.color][g.is_right] = True
unpaired = [g for g in gloves if not glove_search[g.size][g.color][not g.is_right]]
print(unpaired)
Output:
[lg blue left, med tan left]