Search code examples
algorithmlistdiffedit-distance

How to determine differences in two lists of data


This is an exercise for the CS guys to shine with the theory.

Imagine you have 2 containers with elements. Folders, URLs, Files, Strings, it really doesn't matter.

What is AN algorithm to calculate the added and the removed?

Notice: If there are many ways to solve this problem, please post one per answer so it can be analysed and voted up.

Edit: All the answers solve the matter with 4 containers. Is it possible to use only the initial 2?


Solution

  • Assuming you have two lists of unique items, and the ordering doesn't matter, you can think of them both as sets rather than lists

    If you think of a venn diagram, with list A as one circle and list B as the other, then the intersection of these two is the constant pool.

    Remove all the elements in this intersection from both A and B, and and anything left in A has been deleted, whilst anything left in B has been added.

    So, iterate through A looking for each item in B. If you find it, remove it from both A and B

    Then A is a list of things that were deleted, and B is a list of things that were added

    I think...

    [edit] Ok, with the new "only 2 container" restriction, the same still holds:

    foreach( A ) { 
      if( eleA NOT IN B ) {
        DELETED
      }
    }
    foreach( B ) {
      if( eleB NOT IN A ) {
        ADDED
      }
    }
    

    Then you aren't constructing a new list, or destroying your old ones...but it will take longer as with the previous example, you could just loop over the shorter list and remove the elements from the longer. Here you need to do both lists

    An I'd argue my first solution didn't use 4 containers, it just destroyed two ;-)