Search code examples
sql-servert-sql

Given a view in SQL Server with a union query, can I find out which source columns from each source table link by output column by ordinal position?


We have multiple tables with similar structures, and we consolidate them into covering views using UNION ALL statements. The source tables do not have exact column names, or data types. The tables/views are wide enough and with enough UNIONs such that it makes updating and maintaining the views a nightmare. To taht end I have built a meta-data driven dynamic SQL process to ease this maintenance going forward.

Having built the meta-data tables and procedure, I do not wish to unleash it upon my colleagues without adding the meta-data, nor do I want to populate it all myself by hand. Therefore I was hoping to extract the column data types and ordinality from system views/procs/funcs via a one-time script.

Is it at all possible to retrieve this information programatically?

Example view:

CREATE VIEW info.vw_ActivitySummary AS
SELECT
    Financial_Year AS FINANCIAL_YEAR
  , CAST(NULL AS INT) AS IMPORT_ID
  , Monthly_Activity AS ACTIVITY_VALUE
FROM info.Activity_2324
UNION ALL SELECT
    FINANCIAL_YEAR AS FINANCIAL_YEAR
  , ImportFlowNumber AS IMPORT_ID
  , CAST(NULL AS DECIMAL(38,19)) AS ACTIVITY_VALUE
FROM info.Activity_2223

Research led me to try this DMF:

SELECT * FROM sys.dm_sql_referenced_entities('info.vw_ActivitySummary', 'object')

At first I was hopeful, but this led me to a dead-end simply because there is no link between the ordinality of the output columns and the referenced columns that feed them. True, there is a link in the column order from the [referenced_minor_id] column, but other than their order, the numbers are contiguous and do not break for self-contained calculated columns (such as my CAST(NULL...)s as shown in the example).

Edit

Since there's already an answer that indicates question needs clarity, let me add:

Imagine a set of tables:

info.CoveringView
info.CoveringViewColumns
info.CoveringViewColumnsTableColumns

with cascading one-to-many relationships such that the following view would be possible:

CREATE VIEW info.CoveringViewSpecifications AS
SELECT v.VIEW_SCHEMA,
       v.VIEW_NAME,
       c.VIEW_COLUMN_NAME,
       c.ORDINAL_POSITION,
       c.DATA_TYPE_AND_SIZE,
       s.TABLE_SCHEMA,
       s.TABLE_NAME,
       s.TABLE_COLUMN_NAME
FROM info.CoveringView v
INNER JOIN info.CoveringViewColumns c
    ON c.CoveringViewId = v.Id
INNER JOIN info.CoveringViewColumnsTableColumns s
    ON s.CoveringViewColumnsId = c.Id;

And from this one can write a function that creates individual select statements that have the columns ordered correctly, with appropriate CAST(NULL...).. statements such that when UNION (ALL)'d together they form a usable covering view, all data types cast explicitly, etc.

Now imagine there's already a number of views we want to find meta-data for that have around 30-40 UNIONs in, so, can I generate that meta-data from the existing views programatically?


Solution

  • I still hold out hope there is a more elegant way to handle this, however I settled on parsing the full view definition and using string manipulation:

    SET NOCOUNT ON;
    
    DECLARE @UnionQuery NVARCHAR(MAX) = OBJECT_DEFINITION(OBJECT_ID(N'dbo.acm'));
    
    -- This should be 'UNION' or 'UNION ALL' as required. Include it exactly as 
        -- it would appear in your query, or add extra string manipulation before
        -- the replace
    DECLARE @UnionText NVARCHAR(20) = N'UNION ALL';
    
    -- Choose some single character that is guaranteed not to be in your object 
        -- definition. This shotgun character is usually a safe bet.
        -- (STRING_SPLIT only accepts single-character delimiters)
    DECLARE @Delimiter NCHAR(1) = N'¬';
    
    -- Line endings are easier to read as a variable than two concatenated CHAR
        -- functions
    DECLARE @CrLf NCHAR(2) = CHAR(13) + CHAR(10);
    
    -- Tab characters, for readability, same as above
    DECLARE @Tab NCHAR(1) = CHAR(9);
    
    -- Remove CREATE ... AS parts
    SET @UnionQuery =
        REPLACE(
            @UnionQuery
          , SUBSTRING(
                @UnionQuery
              , CHARINDEX('CREATE', @UnionQuery)
              , CHARINDEX('AS', @UnionQuery, CHARINDEX('CREATE', @UnionQuery))
                - CHARINDEX('CREATE', @UnionQuery) + 2
            )
          , ''
        );
    
    -- Remove single-line comments
    WHILE CHARINDEX('--', @UnionQuery) > 0
    BEGIN
        SET @UnionQuery =
            REPLACE(
                @UnionQuery
              , SUBSTRING(
                    @UnionQuery
                  , CHARINDEX('--', @UnionQuery)
                  , CHARINDEX(@CrLf, @UnionQuery, CHARINDEX('--', @UnionQuery))
                    - CHARINDEX('--', @UnionQuery) + 2
                )
              , ''
            );
    END;
    
    -- Remove multi-line comments
    WHILE CHARINDEX('/*', @UnionQuery) > 0
    BEGIN
        SET @UnionQuery =
            REPLACE(
                @UnionQuery
              , SUBSTRING(
                    @UnionQuery
                  , CHARINDEX('/*', @UnionQuery)
                  , CHARINDEX('*/', @UnionQuery, CHARINDEX('/*', @UnionQuery))
                    - CHARINDEX('/*', @UnionQuery) + 2
                )
              , ''
            );
    END;
    
    -- Replace UNION part with single-character delimiter for STRING_SPLIT
    SET @UnionQuery = REPLACE(@UnionQuery, @UnionText, @Delimiter);
    
    -- We can store our results here
    DROP TABLE IF EXISTS #ParsedColumns;
    CREATE TABLE #ParsedColumns (
        SourceTable NVARCHAR(128) NOT NULL
      , ColumnName NVARCHAR(128) NOT NULL
      , OrdinalPosition INT NOT NULL
    );
    
    -- Split the query into individual SELECT statements
    DECLARE @Queries AS TABLE (
        ID INT IDENTITY(1, 1)
      , Query NVARCHAR(MAX) NULL
    );
    
    INSERT INTO @Queries (
        Query
    )
    SELECT value
    FROM STRING_SPLIT(@UnionQuery, @Delimiter);
    
    -- Parse each SELECT statement
    DECLARE @Index INT = 1;
    DECLARE @CurrentQuery NVARCHAR(MAX);
    
    WHILE EXISTS (SELECT 1 FROM @Queries WHERE ID = @Index)
    BEGIN
    
        SELECT @CurrentQuery = Query
        FROM @Queries
        WHERE ID = @Index;
    
        -- Extract table name from the FROM clause 
            -- (simple version, assumes single table in each SELECT)
        DECLARE @TableName NVARCHAR(128) =
        (LTRIM(
             RTRIM(
                 SUBSTRING(
                     @CurrentQuery
                   , CHARINDEX('FROM', @CurrentQuery) + 5
                   , CHARINDEX(
                         ' '
                       , @CurrentQuery + ' '
                       , CHARINDEX('FROM', @CurrentQuery) + 5
                     ) - CHARINDEX('FROM', @CurrentQuery) - 5
                 )
             )
         )
        );
    
        -- Remove SELECT and FROM clause for easier column extraction
        DECLARE @ColumnPart NVARCHAR(MAX) =
            LTRIM(
                RTRIM(
                    REPLACE(
                        SUBSTRING(
                            @CurrentQuery
                          , CHARINDEX('SELECT', @CurrentQuery) + 6
                          , CHARINDEX('FROM', @CurrentQuery)
                            - CHARINDEX('SELECT', @CurrentQuery) - 6
                        )
                      , 'SELECT'
                      , ''
                    )
                )
            );
    
        -- Extract column names and insert into temporary table
        DECLARE @OrdinalPosition INT = 1;
        WHILE CHARINDEX(',', @ColumnPart) > 0
        BEGIN
            DECLARE @ColumnName NVARCHAR(128) =
                LTRIM(
                    RTRIM(
                        SUBSTRING(@ColumnPart, 1, CHARINDEX(',', @ColumnPart) - 1)
                    )
                );
            SET @ColumnPart =
                SUBSTRING(
                    @ColumnPart
                  , CHARINDEX(',', @ColumnPart) + 1
                  , LEN(@ColumnPart)
                );
            INSERT INTO #ParsedColumns (
                SourceTable
              , ColumnName
              , OrdinalPosition
            )
            VALUES
                (REPLACE(REPLACE(@TableName, @CrLf, ''), @Tab, '')
               , REPLACE(REPLACE(@ColumnName, @CrLf, ''), @Tab, '')
               , @OrdinalPosition);
            SET @OrdinalPosition = @OrdinalPosition + 1;
        END;
    
        -- Insert the last column
        INSERT INTO #ParsedColumns (
            SourceTable
          , ColumnName
          , OrdinalPosition
        )
        VALUES
            (REPLACE(REPLACE(@TableName, @CrLf, ''), @Tab, '')
           , REPLACE(REPLACE(LTRIM(RTRIM(@ColumnPart)), @CrLf, ''), @Tab, '')
           , @OrdinalPosition);
    
        -- Move to the next SELECT statement
        SET @Index = @Index + 1;
    END;
    

    The resulting temp table can now be used to feed my meta-data tables:

    -- Retrieve and display the parsed columns
    SELECT PARSENAME(pc.SourceTable, 2) AS SOURCE_SCHEMA
         , PARSENAME(pc.SourceTable, 1) AS SOURCE_NAME
         , pc.OrdinalPosition
         , c.COLUMN_NAME
         , PARSENAME(pc.ColumnName, 1)  AS COLUMN_NAME
         , pc.ColumnName                AS RAW_COLUMN
    FROM
        #ParsedColumns                       pc
        LEFT JOIN INFORMATION_SCHEMA.COLUMNS c
            ON c.TABLE_SCHEMA = PARSENAME(pc.SourceTable, 2)
                AND c.TABLE_NAME = PARSENAME(pc.SourceTable, 1)
                AND c.COLUMN_NAME = PARSENAME(pc.ColumnName, 1)
    ORDER BY
        pc.SourceTable
      , pc.OrdinalPosition;
    

    In my example above I have used INFORMATION_SCHEMA to quality-check the results.

    There's some assumptions based on my exact use-case, so please check carefully what your requirements are, such as each sub-query being a simple SELECT..FROM statement without joins.