I have a list of objects. These objects have nested attributes, which were generated in accordance with a method by hpaulj's response in this post: Object-like attribute access for nested dictionary.
I want to be able to find attributes, and values of attributes within these objects, and manipulate data they hold. However, in the real scenario, there may be upwards of one million objects, and the attributes may be deeply nested, which makes searching through a flat list a costly exercise when lots of manipulations need to occur.
For example, imagine the list of objects is as follows:
list_of_objects = [object1, object2, object3, object4]
self.country = "Kenya", self.disease = "breast cancer"
self.country = "Kenya", self.disease = "diabetes"
self.country = 'Ireland', self.risk_factor.smoking = "Current"
self.country = 'Kenya', self.risk_factor.smoking = "Previous"
These objects were created from the following State
class:
class State:
def __init__(self, state_dictionary):
self._state_dictionary = state_dictionary
for key, value in self._state_dictionary.items():
if isinstance(value, dict):
value = State(value)
setattr(self, key, value)
An example state_dictionary
would be as follows, in the case of object3:
state_dictionary = {
"country":"Ireland",
"risk_factor":{
"smoking":"Current"
}
}
Importantly, nested attributes are State objects too.
I would like to affect all objects that possess an attribute, set of nested attributes, or possess an attribute with a specified value.
My thought, was to create a 'Controller', which would store that original list as separate lists within an object instance of the Controller class. Each of the original attributes and values would point to a list of the objects which contained those attributes or values, the basic design would be as follows:
class Controller:
def __init__(self, list_of_objects):
self.list_of_objects = list_of_objects # Our list of objects from above
self.create_hierarchy_of_objects()
def create_hierarchy_of_objects(self):
for o in self.list_of_objects:
# Does something here
After the create_hierarchy_of_objects
method has run, I would be able to do the following operations:
Controller.country.Kenya
would contain the list of all objects whose self.country is "Kenya", i.e object1, object2, object4Controller.disease
would contain the list of all objects who had the attribute self.disease i.e. object1 and object2Controller.risk_factor.smoking.Current
would contain the list of objects who had that set of attributes i.e. object3The question is how would create_hierarchy_of_objects
work?
I few points of clarification
self.risk_factor.attribute1.attribute2 = "foo"
self.risk_factor.attribtue3.attribute4 = "foo"
as well.If you have to deal with upwards of one million objects, generating an additional hierarchy is probably not the best solution. It would require many additional objects and waste a lot of time for just creating the hierarchy. The hierarchy would also need to be updated whenever list_of_objects
change.
Therefore, I suggest a more generic and dynamic approach by using iterators and a XPath like principle. Let's call it OPath
. The OPath
class is a lightweight object, which simply concatenates the attributes to a kind of attribute path. It also keeps a reference to the original list of entry objects. And finally, it is based on attributes only and so works for any type of object.
The actual lookup happens, when we start iterating through the OPath
object (e.g., putting the object into a list()
, using a for
-loop, ...). The OPath
returns an iterator, which recursively looks up the matching objects according to the attribute path based on the actual content in the originally supplied list. It yield
s one matching object after another to avoid the creation of unnecessary lists with entirely populated matching objects.
class OPath:
def __init__(self, objects, path = []):
self.__objects = objects
self.__path = path
def __getattr__(self, __name):
return OPath(self.__objects, self.__path + [__name])
def __iter__(self):
yield from (__object for __object in self.__objects if self.__matches(__object, self.__path))
@staticmethod
def __matches(__object, path):
if path:
if hasattr(__object, path[0]):
return OPath.__matches(getattr(__object, path[0]), path[1:])
if __object == path[0] and len(path) <= 1:
return True
return False
return True
if __name__ == '__main__':
class State:
def __init__(self, state_dictionary):
self._state_dictionary = state_dictionary
for key, value in self._state_dictionary.items():
if isinstance(value, dict):
value = State(value)
setattr(self, key, value)
o1 = State({ "country":"Kenya", "disease": "breast cancer" })
o2 = State({ "country":"Kenya", "disease": "diabetes" })
o3 = State({ "country":"Ireland", "risk_factor": { "smoking":"Current" } })
o4 = State({ "country":"Kenya", "risk_factor": { "smoking":"Previous" } })
# test cases with absolute paths
print("Select absolute")
path = OPath([o1, o2, o3, o4])
print(list(path) == [o1, o2, o3, o4])
print(list(path.country) == [o1, o2, o3, o4])
print(list(path.country.Kenya) == [o1, o2, o4])
print(list(path.disease) == [o1, o2])
print(list(path.disease.diabetes) == [o2])
print(list(path.risk_factor.smoking) == [o3, o4])
print(list(path.risk_factor.smoking.Current) == [o3])
print(list(path.doesnotexist.smoking.Current) == [])
print(list(path.risk_factor.doesnotexist.Current) == [])
print(list(path.risk_factor.smoking.invalidvalue) == [])
print(list(path.risk_factor.doesnotexist.Current.invalidpath) == [])
# test cases with relative paths
country = OPath([o1, o2, o3, o4], ["country"])
print("Select relative from country:")
print(list(country) == [o1, o2, o3, o4])
print(list(country.Kenya) == [o1, o2, o4])
print("Select all with country=Kenya")
kenya = OPath([o1, o2, o3, o4], ['country', 'Kenya'])
print(list(kenya) == [o1, o2, o4])
Output is expected to be True
for all test cases.