Search code examples
sqlpostgresqljoingaps-and-islands

Club two table based on certain condition in postgresql


Here's my sample input tables:

employee_id project effective_date**
1 A 2014-08-13
1 B 2016-12-21
1 C 2018-02-21
employee_id designation effective_date
1 trainee 2014-08-05
1 senior 2016-08-17
1 team leader 2018-02-05

Table1: describes an employee who undergoes different projects at different date's in an organization.

Table2: describes the same employee from Table1 who undergoes different designation in the same organisation.

Now I want an Expected output table like this:

employee_id project designation effective_date
1 A trainee 2014-08-13
1 A senior 2016-08-17
1 B Senior 2016-12-21
1 B team leader 2018-02-05
1 C team leader 2018-02-21

The fact is that whenever:

  • his project changes, I need to display project effective_date.
  • his designation changes, I need to display designation effective_date but with the project he worked on during this designation change

Solution

  • This problem falls into the gaps-and-islands taxonomy. This specific variant can be solved in three steps:

    • applying a UNION ALL of the two tables while splitting "tab1.project" and "tab2.role" in two separate fields within the same schema
    • compute the partitions, between a non-null value and following null values, with two running sums (one for the "designation" and one for "project")
    • apply two different aggregations on the two different fields, to remove the null values.
    WITH cte AS (
        SELECT employee_id, effective_date,
               project        AS project, 
               NULL           AS role         FROM tab1
        UNION ALL 
        SELECT employee_id, effective_date, 
               NULL           AS project, 
               designation    AS role         FROM tab2
    ), cte2 AS (
        SELECT *,
               COUNT(CASE WHEN project IS NOT NULL THEN 1 END) OVER(
                   PARTITION BY employee_id 
                   ORDER     BY effective_date
               ) AS project_partition,
               COUNT(CASE WHEN role IS NOT NULL THEN 1 END) OVER(
                   PARTITION BY employee_id 
                   ORDER     BY effective_date
               ) AS role_partition
        FROM cte    
    )
    SELECT employee_id, effective_date,
           MAX(project) OVER(PARTITION BY project_partition) AS project,
           MAX(role)    OVER(PARTITION BY role_partition)    AS role
    FROM cte2
    ORDER BY employee_id, effective_date
    

    Check the demo here.