Let's say I have some objects, like in this example (JSON code):
{
"people" : {
"Alice" : {
"position" : "Manager",
"company" : "Company1"
},
"Bob" : {
"position" : "CEO",
"company" : "Company1"
},
"Charlie" : {
"position" : "CEO",
"company" : "Company2"
}
},
"companies" : [
{ "name" : "Company1" },
{ "name" : "Company2" }
]
}
And I want to write a function get_X_of_Y(x, y)
that I could use to pass, for example, get_X_of_Y("CEO", companies[0])
and have it return Bob
.
How could I efficiently do this for large datasets? I have the following function:
def get_X_of_Y (x, y):
for person in people:
if person.position == x and person.company == company.name:
return person
else:
return None
Suppose I have thousands of people and hundreds of companies. Is there a faster way to do this then by looping through everyone? I can precompute the objects if there's a way to make things faster.
Let us say
data = {
"people" : {
"Alice" : {
"position" : "Manager",
"company" : "Company1"
},
"Bob" : {
"position" : "CEO",
"company" : "Company1"
},
"Charlie" : {
"position" : "CEO",
"company" : "Company2"
}
},
"companies" : [
{ "name" : "Company1" },
{ "name" : "Company2" }
]
}
Then You could create a list of people, which is basically a flat structure as compared to your nested dict:
>>> people = [(key, value["position"], value["company"]) for key, value in data["people"].items()]
[('Charlie', 'Company2', 'CEO'),
('Bob', 'Company1', 'CEO'),
('Alice', 'Company1', 'Manager')]
And also a list of companies, which again does away with structure of dict:
>>> companies = [item['name'] for item in data["companies"]]
['Company1', 'Company2']
Now querying is pretty simple, use a filter
method
def get_X_of_Y (x, y):
return filter(lambda item: item[1]==x and item[2]==y, people)
And so you can easily search now:
>>> get_X_of_Y("CEO", companies[0])
[('Bob', 'CEO', 'Company1')]
However, I would still suggest using a database if you really have thousands of people and hundreds of companies.