I was at a technical interview, given the question "find duplicates in an array" and I solved in O(n) time with a hashtable no problem, then I was given a barrage of follow up questions.
Orig: Determine if an array contains duplicate entries.
F1: Now what if the array is very large, and had to be distributed across multiple machines.
F2: What if the network connection between these machines are prone to failure?
F3: What if the hardware themselves are not 100% reliable and may occasionally give off wrong answers?
F4: Design a system so that multiple simultaneous users may need to update this array, while you need to maintain uniqueness of its entries.
I thought about F1 and said then it wouldn't be wise to use a huge hashtable and we could trade runtime to O(n²) to compensate with O(1) memory, but wasn't sure about the rest. Any help?
F2: you have to duplicate data over the different machines, you can chose all data or part of the data.
F3: use checksum values when transferring data between machines.
F4: use some kind of synchronisation (like semaphores) to make sure updating is not done simultaneously.