Search code examples
pythonarraysalgorithmbigdatadata-analysis

To each object from the first list match an object from the second list according to the condition of equality of attributes values


Imagine we have a finite number of districts with a finite number of houses in each of them. Each house has a number and the houses in each district are numbered from 1. One man and one woman live in each house.

We have the following class for representation people:

class Person:
    def __init__(self, name, age, district, house_number):
        self.name = name
        self.age = age
        self.district = district
        self.house_number = house_number

And we have two lists with objects of this class called men and women. To understand the structure of lists below is an example of adding an object to a list.

men.append(Person("Alex", 22, "District 7", 71))

It is considered that the lists are already filled with objects. So we have all men in the men list and all women in the women list. Since there are a finite number of districts, finite number of houses in each of them and each house has one man and one woman, the lengths of the lists are equal. The objects in both lists are randomized.

It is assumed that the amount of data is very large.

The goal of the problem is to find all men over a certain age (variable min_age) in the men list and to match each of them with a woman from the women list who lives in the same house with him.

All men found must be in the men_new list and women in the women_new list. The lists must be comparable so a man and a woman living in the same house must have the same index in the men_new and women_new lists.

I now have the following solution:

# We believe that lists "men", "women" and variable "min_age" are previously defined.

men_new = []
women_new = []

for man in men:
    if man.age > min_age:
        men_new.append(man)

for man in men_new:
    women_new.append(filter(lambda x: x.district == man.district and x.house_number == man.house_number, women))

This solution works great, but it is very slow with large amounts of data. Are there any ways to solve this problem faster? Thanks in advance!


Solution

  • Transform your women list into a dict mapping house number to woman:

    house_to_woman = {}
    for w in women:
        house_to_woman[w.house_number] = w
    

    Then you can make the last line of your code efficient using this mapping:

    for m in men_new:
        women_new.append(house_to_woman[m.house_number])