Search code examples
sqldata-warehousetalenddimensional-modeling

How to flatten a one-to-many relationship


While trying to build a data warehousing application using Talend, we are faced with the following scenario.

We have two tables tables that look like

Table master

ID | CUST_NAME | CUST_EMAIL
------------------------------------
1  | FOO       | [email protected]

Events Table

ID | CUST_ID | EVENT_NAME  | EVENT_DATE      
---------------------------------------
1  | 1       | ACC_APPLIED | 2014-01-01
2  | 1       | ACC_OPENED  | 2014-01-02
3  | 1       | ACC_CLOSED  | 2014-01-02

There is a one-to-many relationship between master and the events table.Since, given a limited number of event names I proposing that we denormalize this structure into something that looks like

ID | CUST_NAME | CUST_EMAIL          | ACC_APP_DATE_ID | ACC_OPEN_DATE_ID |ACC_CLOSE_DATE_ID      
-----------------------------------------------------------------------------------------
1  | FOO       | [email protected] | 20140101        | 20140102         | 20140103

THE DATE_ID columns refer to entries inside the time dimension table.

First question : Is this a good idea ? What are the other alternatives to this scheme ?

Second question : How do I implement this using Talend Open Studio ? I figured out a way in which I moved the data for each event name into it's own temporary table along with cust_id using the tMap component and later linked them together using another tMap. Is there another way to do this in talend ?


Solution

  • To do this in Talend you'll need to first sort your data so that it is reliably in the order of applied, opened and closed for each account and then denormalize it to a single row with a single delimited field for the dates using the tDenormalizeRows component.

    After this you'll want to use tExtractDelimitedFields to split the single dates field.