I'm building a flow network and I have two lists containing nodes which have properties associated with them. I have to connect both list based on these properties.
What I'm doing at the moment is iterate through list A and for each element iterate over all elements of list B and connect nodes if they have atleast 1 matching property. Which means I have to visit all nodes in list B once for every single element in list A.
Is there a faster way of connecting the two lists?
Maybe keep a matrix of all properties and just work through the column of a property and connect based on that? Would use up a lot more memory.
EDIT: for this problem both lists contain nodes which have a list of skills (say 1,2,3) i then need to connect nodes of list a to the same node of list b using that skill.
EDIT: I now give the people a stack with skills and all jobs a hashset of skills needed. I then pop a skill and iterate over all jobs, if it contains a skill i connect both.
Without diving into the language, here are some thinking about algorithmics.
You could have some particular cases when you could improve this with a pre-processing of the lists, depending on your properties.
For example, let's assume your nodes have 2 properties : one id (int) and one weight, and you need to have a match on the id. Then as a pre-processing you could sort your two lists A and B based on your id
=> complexity : O(nA.log(nA)) + O(nB.log(nB))
After that you would do the matching between your 2 lists with 2 indexes and iterating each list only once. Assuming you sorted ascending on the id, at each step, you have xA
, iA
-th element of sorted list sortA
, with id idA
, and xB
, iB
-th element of sorted list sortB
, with id idB
and :
idA = idB
there is a match, do iA++
and iB++
idA > idB
there is no match, do iB++
idA < idB
there is no match, do iA++
=> complexity : O(nA + nB)
=> total complexity : O(nA + nB) + O(nA.log(nA)) + O(nB.log(nB)). Brute force complexity would be O(nA.nB)
You may find other ways to pre-process your lists based on your properties :
The key here is that if you split in multiple steps you may save time at the end thanks to additions instead of multiplications in your complexity
I hope it helps you
PS: in some cases you can also store more things and save time-complexity with an increased space-complexity. Thus you could pre-process multiple aspects of your lists but you would need to store more data -> your choice depends on your constraints also
Edit (integrates comments) : With more space used (I don't know how many skills you have and how many space you can use to store pre-processed things) : you may compute one list for each skill, contaning the id of each candidate having this skill (you cound store all that into a HashMap>) it takes O(nCandidates) to populate all those lists (only one iteration).
For example you would have :
Then for every job, you take the candidates matching the first skill, filter the second skill's candidates list with them, etc. until the last skill and you get the list of all the candidates matching all the skills. Theoretically it's still O(nCandidates) for each job, but in "real life" you may take as first skill the skill with the least candidates, and iterate with the skills increasing the number of candidates so it would be less than O(nCandidates)
For example with previous list, and a job requiring skill1, skill3 and skill4, you would have :
=> operations : looking for candidate1 and candidate2 in [candidate1, candidate2] + looking for candidate1 and candidate2 in [candidate2, candidate3, candidate5] + looking for candidate2 in [candidate2, candidate3, candidate5]
Same thing can be done in the reverse (candidate -> [skills]) to have the jobs matching the skills of a candidate