Search code examples
algorithmsystem

How to answer find duplicates in array extension questions?


I was at a technical interview, given the question "find duplicates in an array" and I solved in O(n) time with a hashtable no problem, then I was given a barrage of follow up questions.

Orig: Determine if an array contains duplicate entries.

  F1: Now what if the array is very large, and had to be distributed across multiple machines.

  F2: What if the network connection between these machines are prone to failure?

  F3: What if the hardware themselves are not 100% reliable and may occasionally give off wrong answers?

  F4: Design a system so that multiple simultaneous users may need to update this array, while you need to maintain uniqueness of its entries.

I thought about F1 and said then it wouldn't be wise to use a huge hashtable and we could trade runtime to O(n²) to compensate with O(1) memory, but wasn't sure about the rest. Any help?


Solution

  • F2: you have to duplicate data over the different machines, you can chose all data or part of the data.

    F3: use checksum values when transferring data between machines.

    F4: use some kind of synchronisation (like semaphores) to make sure updating is not done simultaneously.