Search code examples
pythonrecursive-descentrecursive-datastructures

Recursively dir() a python object to find values of a certain type or with a certain value


I have a complex Python data structure (if it matters, it's a large music21 Score object) which will not pickle due to the presence of a weakref somewhere deep inside the object structure. I've debugged such problems with the stack trace and python debugger before, but it's always a big pain. Is there a tool which runs dir() recursively on all attributes of an object, finding objects hidden in lists, tuples, dicts, etc., and returns those that match a certain value (a lambda function or something like that). A big problem is recursive references, so some sort of memo function (like copy.deepcopy uses) is needed. I tried:

import weakref
def findWeakRef(streamObj, memo=None):
    weakRefList = []
    if memo is None:
        memo = {}
    for x in dir(streamObj):
        xValue = getattr(streamObj, x)
        if id(xValue) in memo:
            continue
        else:
            memo[id(xValue)] = True
        if type(xValue) is weakref.ref:
            weakRefList.append(x, xValue, streamObj)
        if hasattr(xValue, "__iter__"):
            for i in xValue:
                if id(i) in memo:
                    pass
                else:
                    memo[id(i)] = True
                    weakRefList.extend(findWeakRef(i), memo)
        else:
            weakRefList.extend(findWeakRef(xValue), memo)
    return weakRefList

I can probably continue plugging holes in this (the iter isn't what I'd want for dicts, for instance), but before I throw more time into it, wondering if someone knows an easier answer. It could be a pretty useful general tool.


Solution

  • This seems to be the start of an answer. I had to backport some items from the Python 3.2 inspect.getattr_static to make it work so it didn't call properties that just kept generating new objects. Here's the code I came up with:

    #-------------------------------------------------------------------------------
    # Name:         treeYield.py
    # Purpose:      traverse a complex datastructure and yield elements
    #               that fit a given criteria
    #
    # Authors:      Michael Scott Cuthbert
    #
    # Copyright:    Copyright © 2012 Michael Scott Cuthbert
    # License:      CC-BY
    #-------------------------------------------------------------------------------
    import types
    
    class TreeYielder(object):
        def __init__(self, yieldValue = None):
            '''
            `yieldValue` should be a lambda function that
            returns True/False or a function/method call that
            will be passed the value of a current attribute
            '''        
            self.currentStack = []
            self.yieldValue = yieldValue
            self.stackVals = []
            t = types
            self.nonIterables = [t.IntType, t.StringType, t.UnicodeType, t.LongType,
                                 t.FloatType, t.NoneType, t.BooleanType]
    
        def run(self, obj, memo = None):
            '''
            traverse all attributes of an object looking
            for subObjects that meet a certain criteria.
            yield them.
    
            `memo` is a dictionary to keep track of objects
            that have already been seen
    
            The original object is added to the memo and
            also checked for yieldValue
            '''
            if memo is None:
                memo = {}
            self.memo = memo
            if id(obj) in self.memo:
                self.memo[id(obj)] += 1
                return
            else:
                self.memo[id(obj)] = 1
    
            if self.yieldValue(obj) is True:
                yield obj
    
    
            ### now check for sub values...
            self.currentStack.append(obj)
    
            tObj = type(obj)
            if tObj in self.nonIterables:
                pass
            elif tObj == types.DictType:
                for keyX in obj:
                    dictTuple = ('dict', keyX)
                    self.stackVals.append(dictTuple)
                    x = obj[keyX]
                    for z in self.run(x, memo=memo):
                        yield z
                    self.stackVals.pop()
    
            elif tObj in [types.ListType, types.TupleType]:
                for i,x in enumerate(obj):
                    listTuple = ('listLike', i)
                    self.stackVals.append(listTuple)
                    for z in self.run(x, memo=memo):
                        yield z
                    self.stackVals.pop()
    
            else: # objects or uncaught types...
                ### from http://bugs.python.org/file18699/static.py
                try:
                    instance_dict = object.__getattribute__(obj, "__dict__")
                except AttributeError:
                    ## probably uncaught static object
                    return
    
                for x in instance_dict:
                    try:
                        gotValue = object.__getattribute__(obj, x)
                    except: # ?? property that relies on something else being set.
                        continue
                    objTuple = ('getattr', x)
                    self.stackVals.append(objTuple)
                    try:
                        for z in self.run(gotValue, memo=memo):
                            yield z
                    except RuntimeError:
                        raise Exception("Maximum recursion on:\n%s" % self.currentLevel())
                    self.stackVals.pop()                
    
            self.currentStack.pop()
    
        def currentLevel(self):
            currentStr = ""
            for stackType, stackValue in self.stackVals:
                if stackType == 'dict':
                    if isinstance(stackValue, str):
                        currentStr += "['" + stackValue + "']"
                    elif isinstance(stackValue, unicode):
                        currentStr += "[u'" + stackValue + "']"
                    else: # numeric key...
                        currentStr += "[" + str(stackValue) + "]"
                elif stackType == 'listLike':
                    currentStr += "[" + str(stackValue) + "]"
                elif stackType == 'getattr':
                    currentStr += ".__getattribute__('" + stackValue + "')"
                else:
                    raise Exception("Cannot get attribute of type %s" % stackType)
            return currentStr
    

    This code lets you run something like this:

    class Mock(object):
        def __init__(self, mockThing, embedMock = True):
            self.abby = 30
            self.mocker = mockThing
            self.mockList = [mockThing, mockThing, 40]
            self.embeddedMock = None
            if embedMock is True:
                self.embeddedMock = Mock(mockThing, embedMock = False)
    
    mockType = lambda x: x.__class__.__name__ == 'Mock'
    
    subList = [100, 60, -2]
    myList = [5, 20, [5, 12, 17], 30, {'hello': 10, 'goodbye': 22, 'mock': Mock(subList)}, -20, Mock(subList)]
    myList.append(myList)
    
    ty = TreeYielder(mockType)
    for val in ty.run(myList):
        print(val, ty.currentLevel())
    

    And get:

    (<__main__.Mock object at 0x01DEBD10>, "[4]['mock']")
    (<__main__.Mock object at 0x01DEF370>, "[4]['mock'].__getattribute__('embeddedMock')")
    (<__main__.Mock object at 0x01DEF390>, '[6]')
    (<__main__.Mock object at 0x01DEF3B0>, "[6].__getattribute__('embeddedMock')")
    

    Or run:

    high = lambda x: isinstance(x, (int, float)) and x > 10
    ty = TreeYielder(high)
    for val in ty.run(myList):
        print(val, ty.currentLevel())
    

    And get:

    (20, '[1]')
    (12, '[2][1]')
    (17, '[2][2]')
    (30, '[3]')
    (22, "[4]['goodbye']")
    (100, "[4]['mock'].__getattribute__('embeddedMock').__getattribute__('mocker')[0]")
    (60, "[4]['mock'].__getattribute__('embeddedMock').__getattribute__('mocker')[1]")
    (40, "[4]['mock'].__getattribute__('embeddedMock').__getattribute__('mockList')[2]")
    

    I'm still trying to figure out why .abby isn't found, but I figure it's worth posting even at this point, since it's much more on the right track than I was when I started.