Search code examples
sqlpostgresqlinformation-schema

Count how many percent of values on each column are nulls


Is there a way, through the information_schema or otherwise, to calculate how many percent of each column of a table (or a set of tables, better yet) are NULLs?


Solution

  • Your query has a number of problems, most importantly you are not escaping identifiers (which could lead to exceptions at best or SQL injection attacks in the worst case) and you are not taking the schema into account. Use instead:

    SELECT 'SELECT ' || string_agg(concat('round(100 - 100 * count(', col
                      , ') / count(*)::numeric, 2) AS ', col_pct), E'\n     , ')
        || E'\nFROM   ' ||  tbl
    FROM (
       SELECT quote_ident(table_schema) || '.' || quote_ident(table_name) AS tbl
            , quote_ident(column_name) AS col
            , quote_ident(column_name || '_pct') AS col_pct
       FROM   information_schema.columns
       WHERE  table_name = 'my_table_name'
       ORDER  BY ordinal_position
       ) sub
    GROUP  BY tbl;
    

    Produces a query like:

    SELECT round(100 - 100 * count(id) / count(*)::numeric, 2) AS id_pct
         , round(100 - 100 * count(day) / count(*)::numeric, 2) AS day_pct
         , round(100 - 100 * count("oDd X") / count(*)::numeric, 2) AS "oDd X_pct"
    FROM   public.my_table_name;
    

    Closely related answer on dba.SE with a lot more details: