Efficiently find objects satisfying relationship

Let's say I have some objects, like in this example (JSON code):

{
    "people" : {
        "Alice" : {
            "position" : "Manager",
            "company" : "Company1"
        },
        "Bob" : {
            "position" : "CEO",
            "company" : "Company1"
        },
        "Charlie" : {
            "position" : "CEO",
            "company" : "Company2"
        }
    },
    "companies" : [
        { "name" : "Company1" },
        { "name" : "Company2" }
    ]
}

And I want to write a function get_X_of_Y(x, y) that I could use to pass, for example, get_X_of_Y("CEO", companies[0]) and have it return Bob.

How could I efficiently do this for large datasets? I have the following function:

def get_X_of_Y (x, y):
    for person in people:
        if person.position == x and person.company == company.name:
            return person
    else:
        return None

Suppose I have thousands of people and hundreds of companies. Is there a faster way to do this then by looping through everyone? I can precompute the objects if there's a way to make things faster.

Solution

Let us say

data = {
    "people" : {
        "Alice" : {
            "position" : "Manager",
            "company" : "Company1"
        },
        "Bob" : {
            "position" : "CEO",
            "company" : "Company1"
        },
        "Charlie" : {
            "position" : "CEO",
            "company" : "Company2"
        }
    },
    "companies" : [
        { "name" : "Company1" },
        { "name" : "Company2" }
    ]
}

Then You could create a list of people, which is basically a flat structure as compared to your nested dict:

>>> people = [(key, value["position"], value["company"]) for key, value in data["people"].items()]
[('Charlie', 'Company2', 'CEO'),
 ('Bob', 'Company1', 'CEO'),
 ('Alice', 'Company1', 'Manager')]

And also a list of companies, which again does away with structure of dict:

>>> companies = [item['name'] for item in data["companies"]]
['Company1', 'Company2']

Now querying is pretty simple, use a filter method

def get_X_of_Y (x, y):
    return filter(lambda item: item[1]==x and item[2]==y, people)

And so you can easily search now:

>>> get_X_of_Y("CEO", companies[0])
[('Bob', 'CEO', 'Company1')]

However, I would still suggest using a database if you really have thousands of people and hundreds of companies.