Search code examples
javapythonmysqlsqlcaching

Java - Merge data from 2 tables in different data source


I have a use case, where I have to merge data from 2 Tables (which are in different data sources). Consider that each table has employee data (First name, last name, phoneNumber). For each employee, I will have to merge the phoneNumber data from both the Tables and other data will remain the same as in Table 1

  1. If employee data is present in both Table 1 and Table 2, phoneNumber data will be merged as a comma separated values and other data will be sourced from Table 1
  2. If employee is present only in Table 1, then entire data will be sourced from Table 1
  3. If employee is present only in Table 2, then entire data will be sourced from Table 2

These Tables has about 40 lack rows of data each. Close to 5GB. What is the best approach to do this in Java? My concern is if I pull the data from these 2 tables into Java cache, i will still have to loop thought the entire table 2 to see if an employee is present there as well. Or will a python script be better?

Table 1

EmployeeID FirstName LastName PhoneNumber
EM01 Jhon Doe 12345
EM02 Dave Joe 34567

Table 2:

EmployeeID FirstName LastName PhoneNumber
EM01 Jhon Doe 89000
EM03 Gabe Mai 45678

Table 3 (After merging the phone numbers):

EmployeeID FirstName LastName PhoneNumber
EM01 Jhon Doe 12345,89000
EM02 Dave Joe 34567
EM03 Gabe Mai 45678

Solution

  • You can easily do this as a SQL query.

    Basically, you want a full join, but -- alas -- MySQL doesn't support that.

    So, one method is:

    select t1.EmployeeID, t1.FirstName, t1.LastName
           concat_ws(',', t1.PhoneNumber, t2.PhoneNumber) as PhoneNumber
    from table1 t1 left join
         table2 t2
         on t1.EmployeeID = t2.EmployeeID
    union all
    select t2.EmployeeID, t2.FirstName, t2.LastName, t2.PhoneNumber
    from table2 t2
         table1 t1 left join
         on t1.EmployeeID = t1.EmployeeID
    where t1.EmployeeID is null;
    

    That is, get all the rows for the employees in table1. Then add in the additional rows from table2. For the first part, concat_ws() is convenient for combining the phone numbers.