Search code examples
sqlsql-servert-sqlsql-server-2012query-optimization

Select non-empty columns using SQL Server


I am using SQL Server 2012. i have a table with 90 columns. I am trying to select only columns that contains data. After searching i used the following procedure:

1- Getting all columns count using one select query

2- Pivoting Result Table into a Temp table

3- Creating Select query

4- Executing this query

Here is the query i used:

DECLARE @strTablename  varchar(100) = 'dbo.MyTable'
DECLARE @strQuery  varchar(max) = ''
DECLARE @strSecondQuery  varchar(max) = 'SELECT '
DECLARE @strUnPivot as varchar(max) = ' UNPIVOT ([Count] for [Column] IN ('

CREATE TABLE ##tblTemp([Column] varchar(50), [Count]  Int)

SELECT @strQuery = ISNULL(@strQuery,'') + 'Count([' + name + ']) as [' + name + ']  ,' from sys.columns where object_id = object_id(@strTablename) and is_nullable = 1
SELECT @strUnPivot = ISNULL(@strUnPivot,'') + '[' + name + '] ,' from sys.columns where object_id = object_id(@strTablename) and is_nullable = 1


SET @strQuery = 'SELECT [Column],[Count] FROM ( SELECT ' + SUBSTRING(@strQuery,1,LEN(@strQuery) - 1) + ' FROM ' + @strTablename + ') AS p ' + SUBSTRING(@strUnPivot,1,LEN(@strUnPivot) - 1) + ')) AS unpvt '

INSERT INTO ##tblTemp EXEC (@strQuery)

SELECT @strSecondQuery = @strSecondQuery + '[' + [Column] + '],'  from ##tblTemp WHERE [Count] > 0

DROP TABLE ##tblTemp

SET @strSecondQuery = SUBSTRING(@strSecondQuery,1,LEN(@strSecondQuery) - 1) + ' FROM ' + @strTablename

EXEC (@strSecondQuery)

The problem is that this query is TOO SLOW. Is there a best way to achieve this?

Notes:

  • Table have only one clustered index on primary key Column ID and does not contains any other indexes.
  • Table is not editable.
  • Table contains very large data.
  • Query is taking about 1 minute to be executed

Thanks in advance.


Solution

  • I do not know if this is faster, but you might use one trick: FOR XML AUTO will ommit columns without content:

    DECLARE @tbl TABLE(col1 INT,col2 INT,col3 INT);
    INSERT INTO @tbl VALUES (1,2,NULL),(1,NULL,NULL),(NULL,NULL,NULL);
    
    SELECT * 
    FROM @tbl AS tbl
    FOR XML AUTO
    

    This is the result: col3 is missing...

    <tbl col1="1" col2="2" />
    <tbl col1="1" />
    <tbl />
    

    Knowing this, you could find the list of columns, which are not NULL in all rows, like this:

    DECLARE @ColList VARCHAR(MAX)=
    STUFF
    (
        (
        SELECT DISTINCT ',' + Attr.value('local-name(.)','nvarchar(max)')
        FROM
        (
            SELECT
            (
                SELECT *
                FROM @tbl AS tbl
                FOR XML AUTO,TYPE
            ) AS TheXML
        ) AS t
        CROSS APPLY t.TheXML.nodes('/tbl/@*') AS A(Attr) 
        FOR XML PATH('')
        ),1,1,''
    );
    
    SELECT @ColList
    

    The content of @ColList is now col1,col2. This string you can place in a dynamically created SELECT.

    UPDATE: Hints

    It would be very clever, to replace the SELECT * with a column list created from INFORMATION_SCHEMA.COLUMNS excluding all not-nullable. And - if needed and possible - types, wich contain very large data (BLOBs).

    UPDATE2: Performance

    Don't know what your very large data means actually... Just tried this on a table with about 500.000 rows (with SELECT *) and it returned correctly after less than one minute. Hope, this is fast enough...