Search code examples
sqlsql-servernullazure-sql-server

Single SQL query to find null values in all columns in a data base


I would like to identify the number of null values in each column in all tables.I have a data base it consist of around 250 tables.Most of them are in use.The problem is almost all tables contain unwanted columns which created for some short term use.Now we want to identify columns with null values for all tables.Since the count of table is large and time is less.I would like to know an easiest way to identify null record count on each table in column wise.

I tried this query which i got from internet.But in this i have to give each table name manually.

DECLARE @cols1 NVARCHAR(MAX);
DECLARE @sql NVARCHAR(MAX);

SELECT @cols1 = STUFF((
    SELECT ', COUNT(CASE WHEN ISNULL(CONVERT(NVARCHAR(MAX), [' + t1.NAME + ']),'''' ) = '''' THEN 1 END) AS ' + t1.name
    FROM sys.columns AS t1
    WHERE t1.object_id = OBJECT_ID('Area')
    -- ORDER BY ', COUNT([' + t1.name + ']) AS ' + t1.name
    FOR XML PATH('')
), 1, 2, '');

SET @sql = '
SELECT ' + @cols1 + '
FROM Area
'
EXEC(@sql)

Please help me get a improved query get Result.

Thanku


Solution

  • This is a mess, but it works:

    DECLARE @SQL nvarchar(MAX),
            @CRLF nchar(2) = NCHAR(13) + NCHAR(10);
    
    CREATE TABLE #NullCounts (SchemaName sysname,
                              TableName sysname,
                              ColumnName sysname,
                              NULLCount bigint);
    
    DECLARE @Delimiter nchar(3) = ',' +@CRLF;
    
    SET @SQL = STUFF((SELECT @CRLF + @CRLF +
                             N'WITH Counts AS(' + @CRLF +
                             N'    SELECT N' + QUOTENAME(s.[name],'''') +N' AS SchemaName,' + @CRLF +
                             N'           N' + QUOTENAME(t.[name],'''') +N' AS TableName,' + @CRLF +
                             STRING_AGG(N'           COUNT_BIG(CASE WHEN ' + QUOTENAME(c.[name]) + N' IS NULL THEN 1 END) AS ' + QUOTENAME(c.[name]),@Delimiter) WITHIN GROUP(ORDER BY c.column_id) + @CRLF +
                             N'    FROM ' + QUOTENAME(s.[name]) + N'.' + QUOTENAME(t.[name]) + N' T)' + @CRLF +
                             N'INSERT INTO #NullCounts(SchemaName, TableName, ColumnName, NULLCount)' + @CRLF +
                             N'SELECT SchemaName,' + @CRLF +
                             N'       TableName,' + @CRLF +
                             N'       V.ColumnName,' + @CRLF +
                             N'       V.NULLCount' + @CRLF +
                             N'FROM Counts C' + @CRLF +
                             N'     CROSS APPLY (VALUES' +
                             STUFF(STRING_AGG(N'                        (N' + QUOTENAME(c.[name], '''') + N', C.' + QUOTENAME(c.[name]) + N')',@Delimiter) WITHIN GROUP (ORDER BY c.column_id),1,24,N'') + N')V(ColumnName,NULLCount);'
                      FROM sys.schemas s
                           JOIN sys.tables t ON s.schema_id = t.schema_id
                           JOIN sys.columns c ON t.object_id = c.object_id
                      GROUP BY s.[name], t.[name]
                      FOR XML PATH(N''),TYPE).value('.','nvarchar(MAX)'),1,4,N'');
         
    --PRINT @SQL; --This is gunna be way longer than 4,000 characters, so you'll want SELECT
    
    EXEC sys.sp_executesql @SQL;
    
    GO
    
    SELECT *
    FROM #NullCounts
    ORDER BY SchemaName,
             TableName,
             ColumnName;
    
    GO
    
    DROP TABLE #NullCounts;
    

    Yes, I mix STRING_AGG and FOR XML PATH, yes it's an eyesore, but the printed (selected) SQL produces some very nice statements. See below:

    WITH Counts AS(
        SELECT N'dbo' AS SchemaName,
               N'PerformanceTest' AS TableName,
               COUNT_BIG(CASE WHEN TestID IS NULL THEN 1 END) AS [TestID],
               COUNT_BIG(CASE WHEN TestTarget IS NULL THEN 1 END) AS [TestTarget],
               COUNT_BIG(CASE WHEN TestName IS NULL THEN 1 END) AS [TestName],
               COUNT_BIG(CASE WHEN TimeStart IS NULL THEN 1 END) AS [TimeStart],
               COUNT_BIG(CASE WHEN TimeEnd IS NULL THEN 1 END) AS [TimeEnd],
               COUNT_BIG(CASE WHEN TimeTaken_ms IS NULL THEN 1 END) AS [TimeTaken_ms],
               COUNT_BIG(CASE WHEN TotalRows IS NULL THEN 1 END) AS [TotalRows],
               COUNT_BIG(CASE WHEN RowSets IS NULL THEN 1 END) AS [RowSets],
               COUNT_BIG(CASE WHEN AvgRowsPerSet IS NULL THEN 1 END) AS [AvgRowsPerSet]
        FROM [dbo].[PerformanceTest] T)
    INSERT INTO #NullCounts(SchemaName, TableName, ColumnName, NULLCount)
    SELECT SchemaName,
           TableName,
           V.ColumnName,
           V.NULLCount
    FROM Counts C
         CROSS APPLY (VALUES(N'TestID', C.[TestID]),
                            (N'TestTarget', C.[TestTarget]),
                            (N'TestName', C.[TestName]),
                            (N'TimeStart', C.[TimeStart]),
                            (N'TimeEnd', C.[TimeEnd]),
                            (N'TimeTaken_ms', C.[TimeTaken_ms]),
                            (N'TotalRows', C.[TotalRows]),
                            (N'RowSets', C.[RowSets]),
                            (N'AvgRowsPerSet', C.[AvgRowsPerSet]))V(ColumnName,NULLCount);
    
    WITH Counts AS(
        SELECT N'dbo' AS SchemaName,
               N'someTable' AS TableName,
               COUNT_BIG(CASE WHEN id IS NULL THEN 1 END) AS [id],
               COUNT_BIG(CASE WHEN SomeCol IS NULL THEN 1 END) AS [SomeCol]
        FROM [dbo].[someTable] T)
    INSERT INTO #NullCounts(SchemaName, TableName, ColumnName, NULLCount)
    SELECT SchemaName,
           TableName,
           V.ColumnName,
           V.NULLCount
    FROM Counts C
         CROSS APPLY (VALUES(N'id', C.[id]),
                            (N'SomeCol', C.[SomeCol]))V(ColumnName,NULLCount);
    

    And yes, I really spent the last 45 minutes writing all that...

    Honestly, this is not entry level, and if you don't understand it, you shouldn't be using it; but also, you I very much doubt you'll find a different solution that is entry level and that is as performant as this. A CURSOR, for example, although probably easier to understand would be really slow doing this.

    Caveat: If you have any deprecated data types in your database (i.e. text) this will fail. If that is the case, you will need to ensure you eliminate them from the query in the WHERE. However, I suggest that you fix your data types (text, for example, has been deprecated for 15 years).