Search code examples
sqlsql-serverunpivot

Unpivot unknown number of rows and column header row


I need to transform this table:

enter image description here

into this table:

enter image description here

for a report. The column headers must be the first row, and the total number of rows is unknown. The column header names in the translated table are not important to the result. They may simply be (A,B,C)

I created a simple sample case in place of the actual business table. This is the code so far:

If(OBJECT_ID('tempdb..#candidates') Is Not Null) Drop Table #candidates
CREATE TABLE #candidates
(LAST_NAME  VARCHAR(128),FIRST_NAME VARCHAR(128),MIDDLE_NAME VARCHAR(128))

INSERT INTO #candidates
Values ('Doe', 'John', 'J'), ('Doe', 'Jane', 'S')

SELECT * FROM #candidates

-- First Attempt via UNPIVOT
SELECT A, B
FROM(
      SELECT LAST_NAME,FIRST_NAME,MIDDLE_NAME       
      from #candidates c
)p
unpivot(
      --A will hold the columns
      B for A IN (
      LAST_NAME,FIRST_NAME,MIDDLE_NAME     
      )
) AS u

-- Second Attempt via CROSS APPLY
Select X.A,X.B
From #candidates C
Cross APPLY
(
    Values
    ('LAST_NAME', t.LAST_NAME),
    ('FIRST_NAME', t.FIRST_NAME),
    ('MIDDLE_NAME', T.MIDDLE_NAME)
)X (A,B);

which both result in this table:

enter image description here

I have found a solution to achieve the proper format, but it requires the number of rows to be known in advance and selecting the columns manually:

SELECT Distinct 'LAST_NAME' as [column],
PersonA = (Select Last_Name FROM (Select ROW_NUMBER() over(order by Last_Name) as RowNum, *
From #candidates)u WHERE u.RowNum = 1),
PersonB = (Select Last_Name FROM (Select ROW_NUMBER() over(order by Last_Name) as RowNum, *
From #candidates)u WHERE u.RowNum = 2)
FROM #candidates
UNION
SELECT Distinct 'First_Name' as [column],
PersonA = (Select First_Name FROM (Select ROW_NUMBER() over(order by First_Name) as RowNum, *
From #candidates)u WHERE u.RowNum = 1),
PersonB = (Select First_Name FROM (Select ROW_NUMBER() over(order by First_Name) as RowNum, *
From #candidates)u WHERE u.RowNum = 2)
FROM #candidates
UNION
SELECT Distinct 'Middle_Name' as [column],
PersonA = (Select Middle_Name FROM (Select ROW_NUMBER() over(order by Middle_Name) as RowNum, *
From #candidates)u WHERE u.RowNum = 1),
PersonB = (Select Middle_Name FROM (Select ROW_NUMBER() over(order by Middle_Name) as RowNum, *
From #candidates)u WHERE u.RowNum = 2)

How can I modify this code to achieve the correct format and account for unknown rows?


Solution

  • Unpivoting without the aid of "unvpivot":

    I want to expand on a comment, and solution, by John Cappelletti who suggests a solution that can be found here and specifically here (+1 !!). It is a very adaptable solution, plus it eradicates problem of converting multiple data types into a single column. However, I think it is worth examining the solution, so it is more clearly understood.:

    Here is an overall query transforms a conventional table into key/value pairs:

    /* unpivot into key value pairs generated via JSON */
    
    SELECT
          A.[id] AS ID -- the primary key of the source table
        , B.[Key]
        , B.[Value]
    FROM Person A
    CROSS APPLY (
        SELECT *
        FROM OpenJson((
                    /* list out the columns you need to be unpivoted into keys here */
                    SELECT
                          A.FirstName
                        , A.MiddleName
                        , A.LastName
                        , A.PersonType
                        , A.ModifiedDate
                        , A.Address
                    FOR JSON Path
                        , Without_Array_Wrapper
                    ))
        ) B
    

    In more detail, the innermost part of the cross apply is:

    select A.col1, A.col2, A.etc FOR JSON Path, Without_Array_Wrapper
    

    What FOR JSON Path, Without_Array_Wrapper does is convert the nominated columns into a JSON string (without the [ or ]). Importantly this automatically converts all data types into strings e.g. a datetime value becomes "2023-05-27T03:35:55.753". For each source column the JSON contains the name of that column and the value in that column for the row. e.g. you get something like this:

    {"id":1,"firstname":"John","lastname":"Doe","age":30,"ModifiedDate":"2023-05-27T03:35:55.753","address":"{"street": "123 Main St", "city": "Springfield", "state": "IL"}"}

    Once this is performed, then the query uses FROM OpenJson on that JSON string which transforms the single string into rows like this:

    +--------------+-----------------------------------------------------------------+------+
    |     key      |                              value                              | type |
    +--------------+-----------------------------------------------------------------+------+
    | id           | 1                                                               |    2 |
    | firstname    | John                                                            |    1 |
    | lastname     | Doe                                                             |    1 |
    | age          | 30                                                              |    2 |
    | ModifiedDate | 2023-05-27T03:35:55.753                                         |    1 |
    | address      | {"street": "123 Main St", "city": "Springfield", "state": "IL"} |    1 |
    +--------------+-----------------------------------------------------------------+------+
    

    Then the cross apply joins the key and value columns of the multiple rows to the id the source table (or, to whatever column is the primary key). Hence, you get unpivoted key/value pair result.

    NB: This logic can be applied to almost any table or select query/view with very little effort.

    Demo of the above JSON unpivot:

    In the above I have suggested using an approach that lists the source columns that you do want in the final result. An alternative approach is to instead select all columns for conversion to JSON, but then optionally exclude any that you do not want before converting the JSON into rows. e.g.

    SELECT
          A.[id] AS ID -- the primary key of the source table
        , B.[Key]
        , B.[Value]
    FROM Person A
    CROSS APPLY (
        SELECT *
        FROM OpenJson((
                    SELECT A.*
                    FOR JSON Path
                        , Without_Array_Wrapper
                    ))
        /*
         * list columns of the source table you do NOT want in the key/value pairs
         */
        WHERE [Key] NOT IN (
                  'id' -- the primary key of the source table
                , 'EmailPromotion'
                , 'AdditionalContactInfo'
                , 'Demographics'
                , 'BusinessEntityID'
                )
        ) B
    

    For performance I prefer the approach of only including the columns you need, but the alternative, excluding some columns, could be simpler to write and fine if performance isn't of concern.

    These queries (and others) above can also be examined at this fiddle:


    This question also requires an additional step which is to re-arrange the unpivoted data into rows, with a potentially unknown number of rows. For this we are going to need "dynamic sql" (to cater for the unknown columns) and we also need to "re-pivot" the data to force the data into rows with the multiple columns containing the data.

    For this I will use the JSON based unpivot approach explained earlier: because it absolutely resolves any multiple data type issues hence allowing the final pivot to work with any source data types.

    DECLARE @columns NVARCHAR(MAX)
    DECLARE @sql NVARCHAR(MAX)
    
    -- Generate the dynamic @columns for the pivot operation
    SET @columns = (
        SELECT STRING_AGG(QUOTENAME(id), ', ')
        FROM (
            SELECT ROW_NUMBER() OVER (ORDER BY LAST_NAME, FIRST_NAME, MIDDLE_NAME) AS id
            FROM #candidates
        ) AS ids
    )
    
      -- Generate the final dynamic SQL for the pivot operation
    SET @sql = N'
    SELECT
        [Key], ' + @columns + '
    FROM (
        SELECT
              A.[id]
            , B.[Key]
            , B.[Value]
        FROM (
          SELECT *
               , row_number() over(order by LAST_NAME, FIRST_NAME, MIDDLE_NAME) as id
          FROM #candidates
          ) A
        CROSS APPLY (
            SELECT *
            FROM OpenJson((
                        /* list out the columns you need to be unpivoted into keys here */
                        SELECT
                              A.LAST_NAME
                            , A.FIRST_NAME
                            , A.MIDDLE_NAME
                        FOR JSON Path
                            , Without_Array_Wrapper
                        ))
            ) B
         ) AS src
    PIVOT (
        MAX(Value)
        FOR id IN (' + @columns + ')
        ) AS piv;
    '
    
    -- SELECT @sql;  -- optional, inspect the generated SQL
    
    -- Execute the final dynamic SQL with the pivot operation
    EXEC sp_executesql @sql;
    
    Key 1 2
    FIRST_NAME Jane John
    LAST_NAME Doe Doe
    MIDDLE_NAME S J

    fiddle