Search code examples
pythonlist-comprehensionlist-comparison

matching lists of objects based on attributes, and identifying the incomparables


For an application I'm working on, I'm searching a directory of files, and expecting to find matching pairs of files to perform some further analysis on.

In this case, a pair is defined as matching on some subset of attributes, but differing in some other attributes.

As part of the error handling/warning, I want to identify any files found that are "incomparable," i.e. files for which the expected "partner" in the pair is not found.

I have a class of objects to store the structured attribute information, and when I read files in the directory, I store each file I find as an element in list of these objects.

Here's a silly simple example

class glove(object):
    def __init__(self, size, color, is_right):
        self.size = size
        self.color = color
        self.is_right = is_right

    def __repr__(self):
        if self.is_right:
            hand = "right"
        else:
            hand = "left"
        s = "{} {} {}".format(self.size, self.color, hand)
        return(s)


gloves = [glove('med', 'black', False),
          glove('med', 'black', True),
          glove('lg', 'black', False),
          glove('lg', 'black', True),
          glove('med', 'brown', False),
          glove('med', 'brown', True),
          glove('lg', 'blue', False),
          glove('med', 'tan', False)]

left_gloves = [x for x in gloves if not x.is_right]
right_gloves = [x for x in gloves if x.is_right]

Let's assume that there's no duplicate elements in the list, and let's define a "pair" as two glove objects that have matching glove.size and glove.color but different values of glove.is_right (i.e. one is Right and one is Left).

Now I'd like to identify incomplete pairs (perhaps into a list of leftovers so that I could error or warn appropriately, e.g. "No Left lg blue glove found" "No Left med tan glove found."

I've seen answers that teach how to identify items "missing" from pairs of lists, but my application has a couple of complexities that I couldn't figure out how to address: linking on attributes of an object, and linking on multiple attributes of an object.

I imagine something is possible with for loops and list comprehension, but I can't quite figure out how to link it all together.


Solution

  • It's pretty easy if you can implement equality/hash for your class:

    class glove(object):
        def __init__(self, size, color, is_right):
            self.size = size
            self.color = color
            self.is_right = is_right
    
        def __repr__(self):
            if self.is_right:
                hand = "right"
            else:
                hand = "left"
            s = "{} {} {}".format(self.size, self.color, hand)
            return(s)
    
        def __eq__(self, other):
            return isinstance(other, glove) and \
                other.size == self.size and \
                other.color == self.color \
                and other.is_right == self.is_right
    
        def __hash__(self):
            return hash((self.size, self.color, self.is_right))
    
    
    gloves = [glove('med', 'black', False),
              glove('med', 'black', True),
              glove('lg', 'black', False),
              glove('lg', 'black', True),
              glove('med', 'brown', False),
              glove('med', 'brown', True),
              glove('lg', 'blue', False),
              glove('med', 'tan', False)]
    
    gloves_set = set(gloves)
    unpaired = [g for g in gloves if glove(g.size, g.color, not g.is_right) not in gloves_set]
    print(unpaired)
    

    Output:

    [lg blue left, med tan left]
    

    You can also consider using namedtuple, which actually does these for you.


    Here is an alternative that does not require implementing equals and hash, nor creating new objects:

    class glove(object):
        def __init__(self, size, color, is_right):
            self.size = size
            self.color = color
            self.is_right = is_right
    
        def __repr__(self):
            if self.is_right:
                hand = "right"
            else:
                hand = "left"
            s = "{} {} {}".format(self.size, self.color, hand)
            return(s)
    
    
    gloves = [glove('med', 'black', False),
              glove('med', 'black', True),
              glove('lg', 'black', False),
              glove('lg', 'black', True),
              glove('med', 'brown', False),
              glove('med', 'brown', True),
              glove('lg', 'blue', False),
              glove('med', 'tan', False)]
    
    # With plain dict
    glove_search = {}
    for g in gloves:
        glove_search.setdefault(g.size, {}).setdefault(g.color, {})[g.is_right] = True
    unpaired = [g for g in gloves
                if not glove_search.get(g.size, {}).get(g.color, {}).get(not g.is_right, False)]
    
    # Or, more idiomatically, with defaultdict
    from collections import defaultdict
    glove_search = defaultdict(lambda: defaultdict(lambda: defaultdict(bool)))
    for g in gloves:
        glove_search[g.size][g.color][g.is_right] = True
    unpaired = [g for g in gloves if not glove_search[g.size][g.color][not g.is_right]]
    
    print(unpaired)
    

    Output:

    [lg blue left, med tan left]