Search code examples
sqlsql-servert-sqlsql-server-2014-express

Find tables that are copied / duplicate structure within database


I have 1000's of tables in SQL Server that have each been created from CSV files. The data in each table is all similar with each table representing a different day.

The problem I am having, is that there are lots of variations in the structure of the tables as well as the names of columns.

Some tables do however have matching structures, and I thought a good starting point for combining the data would be to combine all the data in those one together.

I have been looking for a way to query the database in order to find these tables that have the same structure but have been as yet unsuccessful.

Any help would be greatly appreciated.


Solution

  • So if the tables are truly identical, then try this out. I actually use it to create your insert statements and it can drop the old tables if you want it to.

    IF OBJECT_ID('dbo.table1') IS NOT NULL DROP TABLE dbo.table1;
    IF OBJECT_ID('dbo.table2') IS NOT NULL DROP TABLE dbo.table2;
    IF OBJECT_ID('dbo.table3') IS NOT NULL DROP TABLE dbo.table3;
    IF OBJECT_ID('dbo.table4') IS NOT NULL DROP TABLE dbo.table4;
    IF OBJECT_ID('dbo.table5') IS NOT NULL DROP TABLE dbo.table5;
    
    CREATE TABLE table1 (ID INT,FirstName VARCHAR(25),LastName NVARCHAR(25),EntryDate DATETIME,AvgScore NUMERIC(18,6)); --table1
    CREATE TABLE table2 (ID INT,FirstName VARCHAR(25),LastName NVARCHAR(25),EntryDate DATETIME,AvgScore NUMERIC(18,6)); --matches table1
    CREATE TABLE table3 (ID INT,FirstName VARCHAR(25),LastName NVARCHAR(25),EntryDate DATETIME); --table3
    CREATE TABLE table4 (ID INT,FirstName VARCHAR(25),LastName NVARCHAR(25),EntryDate DATETIME); --matches table3
    CREATE TABLE table5 (ID INT,FirstName VARCHAR(25),LastName NVARCHAR(25),EntryDate DATETIME,AvgScore NUMERIC(18,6)); --matches table1
    
    
    
    
    WITH CTE_matching_Tables
    AS
    (
        SELECT
                A.TABLE_NAME primaryTable,
                A.total_columns,
                COUNT(*) AS matching_columns,
                B.TABLE_NAME AS matchedTable
        FROM        (SELECT *, MAX(ORDINAL_POSITION) OVER (PARTITION BY Table_NAME) AS total_columns FROM INFORMATION_SCHEMA.COLUMNS) A
        INNER JOIN  (SELECT *, MAX(ORDINAL_POSITION) OVER (PARTITION BY Table_NAME) AS total_columns FROM INFORMATION_SCHEMA.COLUMNS) B
        ON      A.TABLE_NAME < B.TABLE_NAME
            AND A.ORDINAL_POSITION = B.ORDINAL_POSITION
            AND A.total_columns = B.total_columns
            AND A.COLUMN_NAME = B.COLUMN_NAME
            AND A.DATA_TYPE = B.DATA_TYPE
            AND A.IS_NULLABLE = B.IS_NULLABLE
            AND (       (A.CHARACTER_MAXIMUM_LENGTH = B.CHARACTER_MAXIMUM_LENGTH) 
                    OR  (A.CHARACTER_MAXIMUM_LENGTH IS NULL AND B.CHARACTER_MAXIMUM_LENGTH IS NULL)
                )
            AND (       (A.NUMERIC_PRECISION = B.NUMERIC_PRECISION) 
                    OR  (A.NUMERIC_PRECISION IS NULL AND B.NUMERIC_PRECISION IS NULL)
                )
            AND (       (A.NUMERIC_SCALE = B.NUMERIC_SCALE) 
                    OR  (A.NUMERIC_SCALE IS NULL AND B.NUMERIC_SCALE IS NULL)
                )
            AND (       (A.DATETIME_PRECISION = B.DATETIME_PRECISION) 
                    OR  (A.DATETIME_PRECISION IS NULL AND B.DATETIME_PRECISION IS NULL)
                )
        GROUP BY A.TABLE_NAME,A.total_columns,B.TABLE_NAME
        HAVING A.total_columns = COUNT(*)
    )
    
    --CTE has all table matches. I find the lowest occurring primaryTable for each matchedTable
        --That way in my case table2 and table 5 insert into table 1 even though table2 and table5 also match
    SELECT  'INSERT INTO ' + MIN(primaryTable) + ' SELECT * FROM ' + matchedTable + '; DROP TABLE ' + matchedTable + ';'
    FROM CTE_matching_Tables
    GROUP BY matchedTable
    

    Results:

    INSERT INTO table1 SELECT * FROM table2; DROP TABLE table2;
    INSERT INTO table3 SELECT * FROM table4; DROP TABLE table4;
    INSERT INTO table1 SELECT * FROM table5; DROP TABLE table5;