Search code examples
sqlpivotsnowflake-schema

How to transpose these extra job_title rows into their own columns?


software im using is snowflake

I have a table of an employees job assignments with the company. They can hold multiple jobs actively at once (null in end_date). I'd like to consolidate this into 1 row:

EMPLOYEE_ID START_DATE END_DATE JOB_TITLE
1442 7/30/24 Tutor
1442 7/30/24 Tutor
1442 6/28/24 Instructional Specialist
1442 5/1/24 6/27/24 Instructional Specialist
1442 12/16/21 7/29/24 Tutor
1442 12/16/21 Lead Instructor
1442 12/16/21 7/29/24 Tutor

If an employee has any null values in the end_date field, then Id like to only retrieve distinct job_titles (eliminate top 2 rows to 1 because both same job on same start date) 1-5 in desc order based on start_date like this:

EMPLOYEE_ID Job_Title_1 Job_Title_2 Job_Title_3 Job_Title_4 Job_Title_5
1442 Tutor Instructional Specialist Lead Instructor

now lets say this employee had no currently active jobs, the table would look like this:

EMPLOYEE_ID START_DATE END_DATE JOB_TITLE
1442 5/1/24 6/27/24 Instructional Specialist
1442 12/16/21 7/29/24 Tutor
1442 12/16/21 7/29/24 Tutor

in that case I'd like the table to look like this:

EMPLOYEE_ID Job_Title_1 Job_Title_2 Job_Title_3 Job_Title_4 Job_Title_5
1442 Instructional Specialist Tutor

Here is the query I am using, and it works, but it's not ordering the job_title 1-5 columns by desc start_date order:

WITH job_position_info_ADP AS (
    SELECT 
        'ADP' AS source, 
        CAST(w.associate_oid AS STRING) AS worker_id,
        CAST(w.id AS STRING) AS Employee_ID,
        TO_CHAR(wah._fivetran_start, 'MM/DD/YY') AS start_date,
        CASE 
            WHEN wah._fivetran_active = TRUE THEN NULL
            ELSE TO_CHAR(wah._fivetran_end, 'MM/DD/YY')
        END AS end_date,
        wah.job_title AS Job_Title,
        ROW_NUMBER() OVER (PARTITION BY CAST(w.id AS STRING) ORDER BY wah._fivetran_start DESC) AS rn
    FROM
        prod_raw.adp_workforce_now.worker w
    JOIN 
        prod_raw.adp_workforce_now.worker_report_to AS wr 
        ON w.id = wr.worker_id
    JOIN 
        prod_raw.adp_workforce_now.work_assignment_history AS wah 
        ON w.id = wah.worker_id
),
recent_jobs_with_null_end AS (
    SELECT 
        Employee_ID,
        Job_Title,
        ROW_NUMBER() OVER (PARTITION BY Employee_ID ORDER BY start_date DESC) AS rn
    FROM 
        job_position_info_ADP
    WHERE 
        end_date IS NULL
),
recent_jobs_all AS (
    SELECT 
        Employee_ID,
        Job_Title,
        ROW_NUMBER() OVER (PARTITION BY Employee_ID ORDER BY start_date DESC) AS rn
    FROM 
        job_position_info_ADP
)
SELECT
    Employee_ID,
    MAX(CASE WHEN rn = 1 THEN Job_Title END) AS Job_Title_1,
    MAX(CASE WHEN rn = 2 THEN Job_Title END) AS Job_Title_2,
    MAX(CASE WHEN rn = 3 THEN Job_Title END) AS Job_Title_3,
    MAX(CASE WHEN rn = 4 THEN Job_Title END) AS Job_Title_4,
    MAX(CASE WHEN rn = 5 THEN Job_Title END) AS Job_Title_5
FROM (
    SELECT * FROM recent_jobs_with_null_end
    UNION ALL
    SELECT * FROM recent_jobs_all
    WHERE Employee_ID NOT IN (SELECT Employee_ID FROM recent_jobs_with_null_end)
) AS combined
WHERE 
    Employee_ID = '1442'
GROUP BY
    Employee_ID;

Solution

  • This is a replicable code in SQL Server, including schema and data.

    The idea was to reorganize the work_assignment_history table to order the job positions by date without including the current positions (END_DATE is null).

    Then, in job_unique, duplicates are removed, and the line number is recalculated. It was discovered that the initial line number is different after removing duplicates, and the date transformed into characters makes it difficult to calculate the new line number.

    Finally, the PIVOT operation is applied to convert rows into columns.

    ;WITH job_position_info_ADP AS (
        SELECT 
            CAST(wah.EMPLOYEE_ID AS varchar(10)) AS Employee_ID,
            wah.START_DATE AS start_date,
            wah.job_title AS Job_Title,
            ROW_NUMBER() OVER (PARTITION BY wah.EMPLOYEE_ID ORDER BY wah.START_DATE DESC) AS rn
        FROM
            [development].[dbo].work_assignment_history AS wah
        WHERE 
            wah.END_DATE is not null
    ),
    job_unique as 
    (
        select Employee_ID, start_date, Job_Title, max(rn) as rn,
            ROW_NUMBER() OVER (PARTITION BY EMPLOYEE_ID ORDER BY start_date DESC) AS new_rn
        from job_position_info_ADP
        group by Employee_ID, start_date, Job_Title
    )
    SELECT Employee_ID, [1], [2], [3], [4],[5]
    FROM
    (
        SELECT Employee_ID, job_title,new_rn
        FROM job_unique
    ) AS SourceTable
    PIVOT
    (
        MAX(job_title)
        FOR new_rn IN ([1], [2], [3], [4],[5])
    ) AS PivotTable;
    

    Output

    Employee_ID  1                              2       3       4      5
    1442         Especialista en Instrucción    Tutor   NULL    NULL    NULL
    

    I hope it is easy to translate to Snowflake, as most SQL languages have PIVOT operations with syntax changes.

    Schema

    CREATE TABLE [dbo].[work_assignment_history ](
        [EMPLOYEE_ID] [smallint] NOT NULL,
        [START_DATE] [date] NOT NULL,
        [END_DATE] [date] NULL,
        [JOB_TITLE] [nvarchar](50) NOT NULL
    ) ON [PRIMARY]
    GO
    

    Data

    EMPLOYEE_ID;START_DATE;END_DATE;JOB_TITLE
    1442;7/30/24;;Tutor
    1442;7/30/24;;Tutor
    1442;6/28/24;;Especialista en Instrucción
    1442;5/1/24;6/27/24;Especialista en Instrucción
    1442;12/16/21;7/29/24;Tutor
    1442;12/16/21;;Instructor Principal
    1442;12/16/21;7/29/24;Tutor
    

    Note To understand the question and develop a solution, the data was translated into Spanish. My apologies.