I found some cool code on here that has been able to give me some good information about my data in tables in SQL Server. I love how this goes through each column of the specified table to give me the information I'm looking for.
SET NOCOUNT ON
DECLARE @Schema NVARCHAR(100) = 'dbo'
DECLARE @Table NVARCHAR(100) = 'myTable'
DECLARE @sql NVARCHAR(MAX) =''
IF OBJECT_ID ('tempdb..#TablePopulation') IS NOT NULL
DROP TABLE #TablePopulation
CREATE TABLE #TablePopulation
(
TableName sysname,
ColumnName sysname,
ColumnPosition int,
NullCount numeric(9, 2),
NonNullCount numeric(9, 2),
TotalRows numeric(9, 2),
PercentNull numeric(9, 2),
PercentNotNull numeric(9, 2)
)
SELECT @sql += '
INSERT INTO #TablePopulation
SELECT
'''+TABLE_NAME+''' AS TableName
,'''+COLUMN_NAME+''' AS ColumnName
,'''+CONVERT(VARCHAR(5),ORDINAL_POSITION)+''' AS ColumnPosition
,SUM(CASE WHEN '+COLUMN_NAME+' IS NULL THEN 1 ELSE 0 END) AS CountNulls
,COUNT(' +COLUMN_NAME+') AS CountnonNulls
,COUNT(*) AS TotalRows
,SUM(CASE WHEN '+COLUMN_NAME+' IS NULL THEN 1 ELSE 0 END) / CASE WHEN COUNT(*) <> 0 THEN COUNT(*) ELSE 0.1 END * 100 AS PercentNull
,COUNT(' + COLUMN_NAME + ') / CASE WHEN COUNT(*) <> 0 THEN COUNT(*) ELSE 0.1 END * 100 AS PercentNotNull
FROM '+QUOTENAME(TABLE_SCHEMA)+'.'+QUOTENAME(TABLE_NAME)+';'+ CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_SCHEMA = @Schema
AND TABLE_NAME = @Table
--INSERT INTO #Nulls
EXEC sp_executesql @sql
SELECT *
FROM #TablePopulation
DROP TABLE #TablePopulation
I'm trying to port this code to BigQuery so I can get similar data for the information we have moved to it.
I'm not nearly as knowledgeable about BQ SQL as T-SQL, and I'm running into an error:
Query error: Scalar subquery produced more than one element at [25:12]
My BigQuery code looks like this:
DECLARE TableName STRING DEFAULT 'myTable';
DECLARE sql STRING DEFAULT '';
SET @@dataset_project_id = 'myProject';
SET @@dataset_id = 'myDataset';
DROP TABLE IF EXISTS TablePopulation;
CREATE TEMP TABLE TablePopulation
(
tableName STRING,
columnName STRING,
columnPosition INT64,
nullCount numeric(9,2),
notNullCount numeric(9,2),
totalRows numeric(9,2),
percentNull numeric(9,2),
percentNotNull numeric(9,2)
);
SET sql = (
SELECT
'INSERT INTO TablePopulation SELECT '
|| table_name || ' AS tableName,'
|| column_name || ' AS columnName, '
|| ordinal_position || ' AS columnPosition, '
|| 'SUM(CASE WHEN ' || column_name || ' IS NULL THEN 1 ELSE 0 END) AS nullCount, '
|| 'COUNT(' || column_name || ') AS notNullCount,'
|| 'COUNT(*) AS totalRows,'
|| 'SAFE_DIVIDE(SUM(CASE WHEN ' || column_name
|| ' IS NULL THEN 1 ELSE 0 END), COUNT(*)) * 100 AS percentNull,'
|| 'SAFE_DIVIDE(SUM(CASE WHEN ' || column_name
|| ' IS NOT NULL THEN 1 ELSE 0 END), COUNT(*)) * 100 AS percentNotNull'
|| 'FROM ' || table_name || ';'
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = TableName
);
--INSERT INTO #Nulls
EXECUTE IMMEDIATE sql;
SELECT *
FROM TablePopulation;
DROP TABLE TablePopulation;
Any suggestions on how to resolve this to help return results for a table would be greatly appreciated.
The problem is that INFORMATION_SCHEMA.COLUMNS
yields more than one row for a given table name, but your SET sql = (SELECT expression FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = TableName);
statement can only handle a single (scalar) result.
The original query used a SELECT @sql += expression FROM ...
hack to effectively concatenate the multiple generated statements into a single dynamic SQL string. You will need to do the same here.
In SQL Server, the STRING_AGG()
aggregation function is preferred, and it appears that BigQuery also has STRING_AGG()
. I am not experienced with BigQuery, but I believe the following may do the trick:
SET sql = (
SELECT STRING_AGG(
'INSERT INTO TablePopulation SELECT '
|| table_name || ' AS tableName,'
|| column_name || ' AS columnName, '
|| ordinal_position || ' AS columnPosition, '
|| 'SUM(CASE WHEN ' || column_name || ' IS NULL THEN 1 ELSE 0 END) AS nullCount, '
|| 'COUNT(' || column_name || ') AS notNullCount,'
|| 'COUNT(*) AS totalRows,'
|| 'SAFE_DIVIDE(SUM(CASE WHEN ' || column_name
|| ' IS NULL THEN 1 ELSE 0 END), COUNT(*)) * 100 AS percentNull,'
|| 'SAFE_DIVIDE(SUM(CASE WHEN ' || column_name
|| ' IS NOT NULL THEN 1 ELSE 0 END), COUNT(*)) * 100 AS percentNotNull'
|| 'FROM ' || table_name || ';'
, '')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE table_name = TableName
);
This should produce a concatenated series of SQL statements like INSERT ...;INSERT ...;INSERT ...;
. If you prefer line breaks between the statements (for printing and debugging), you can replace the ''
with the appropriate BigQuery newline string - possibly '\\n'
.
Addendum On further examination, your translated code has additional problems:
table_name
and column_name
values in the generated SQL need to be quoted as string literals, so that you have SELECT "MyTable" AS tableName, "MyColumn" as columnName, ...
instead of SELECT MyTable AS tableName, MyColumn as columnName, ...
.table_name
and column_name
usages should also be modified to handle names that contain spaces or other special characters or are otherwise not valid as bare identifiers. I believe surrounding the names with back-ticks (`) is the proper (almost safe) BigQuery syntax.