Search code examples
data-synchronization

Merging Data from Different Source to Database


I've to compare data from 2 different sources.

From a different source, I need to get college_id, student_id, student_name & I want to check if they are up-to-date in my database. The source is always having accurate data.

One college may have multiple records.

Every time I login I need to keep this information up-to-date in my database. How do I proceed?

Delete & insert option is not recommended by our team. So, how do I compare?

Can any one provide some efficient pseudo code? Should I store source information in 2-D array in Java or in list or how?

If the record does not exist in source but exist in database, then I need to delete it from DB.

If the record exists in source & does not exist in db, I need to insert it in db.

Appreciate if some one can provide an insight whether to use list or 2-D array with some pseudo code.

Thanks!


Solution

  • Basically, you need to

    1. Load all the records from the database
    2. Load all the records from the trusted source
    3. Find all the records in DB which are not in the trusted source any more. Delete those.
    4. Find all the records in trusted source which are not in DB. Add those.
    5. Find all the changes records. Update those.

    Problem is, you do not specify the primary key for your records—so #5 may be irrelevant.

    For all others, you need a class which encapsulates a record, implements an equals() and hashCode() methods (properly!), and a couple of collections, with a knowledge of removeAll() and retainAll() methods.

    Hope that helps.

    PS. It is indeed possible to do this incrementally, e.g. if you haven't got enough emory to fill the whole dataset. In this case, you will need an ability to read the records ordered, with ordering compatible with equivalence relation.