I have a use case, where I have to merge data from 2 Tables (which are in different data sources). Consider that each table has employee data (First name, last name, phoneNumber). For each employee, I will have to merge the phoneNumber data from both the Tables and other data will remain the same as in Table 1
These Tables has about 40 lack rows of data each. Close to 5GB. What is the best approach to do this in Java? My concern is if I pull the data from these 2 tables into Java cache, i will still have to loop thought the entire table 2 to see if an employee is present there as well. Or will a python script be better?
Table 1
EmployeeID | FirstName | LastName | PhoneNumber |
---|---|---|---|
EM01 | Jhon | Doe | 12345 |
EM02 | Dave | Joe | 34567 |
Table 2:
EmployeeID | FirstName | LastName | PhoneNumber |
---|---|---|---|
EM01 | Jhon | Doe | 89000 |
EM03 | Gabe | Mai | 45678 |
Table 3 (After merging the phone numbers):
EmployeeID | FirstName | LastName | PhoneNumber |
---|---|---|---|
EM01 | Jhon | Doe | 12345,89000 |
EM02 | Dave | Joe | 34567 |
EM03 | Gabe | Mai | 45678 |
You can easily do this as a SQL query.
Basically, you want a full join
, but -- alas -- MySQL doesn't support that.
So, one method is:
select t1.EmployeeID, t1.FirstName, t1.LastName
concat_ws(',', t1.PhoneNumber, t2.PhoneNumber) as PhoneNumber
from table1 t1 left join
table2 t2
on t1.EmployeeID = t2.EmployeeID
union all
select t2.EmployeeID, t2.FirstName, t2.LastName, t2.PhoneNumber
from table2 t2
table1 t1 left join
on t1.EmployeeID = t1.EmployeeID
where t1.EmployeeID is null;
That is, get all the rows for the employees in table1
. Then add in the additional rows from table2
. For the first part, concat_ws()
is convenient for combining the phone numbers.