Search code examples
sqlsql-serversql-server-2012

How to SELECT COUNT from tables currently being INSERT?


Hi consider there is an INSERT statement running on a table TABLE_A, which takes a long time, I would like to see how has it progressed.

What I tried was to open up a new session (new query window in SSMS) while the long running statement is still in process, I ran the query

SELECT COUNT(1) FROM TABLE_A WITH (nolock)

hoping that it will return right away with the number of rows everytime I run the query, but the test result was even with (nolock), still, it only returns after the INSERT statement is completed.

What have I missed? Do I add (nolock) to the INSERT statement as well? Or is this not achievable?


(Edit) OK, I have found what I missed. If you first use CREATE TABLE TABLE_A, then INSERT INTO TABLE_A, the SELECT COUNT will work. If you use SELECT * INTO TABLE_A FROM xxx, without first creating TABLE_A, then non of the following will work (not even sysindexes).


Solution

  • If you are using SQL Server 2016 the live query statistics feature can allow you to see the progress of the insert in real time.

    The below screenshot was taken while inserting 10 million rows into a table with a clustered index and a single nonclustered index.

    It shows that the insert was 88% complete on the clustered index and this will be followed by a sort operator to get the values into non clustered index key order before inserting into the NCI. This is a blocking operator and the sort cannot output any rows until all input rows are consumed so the operators to the left of this are 0% done.

    enter image description here

    With respect to your question on NOLOCK

    It is trivial to test

    Connection 1

    USE tempdb
    
    CREATE TABLE T2
      (
         X INT IDENTITY PRIMARY KEY,
         F CHAR(8000)
      );
    
    WHILE NOT EXISTS(SELECT * FROM   T2 WITH (NOLOCK))
      LOOP:
    
    SELECT COUNT(*) AS CountMethod FROM   T2 WITH (NOLOCK);
    
    SELECT rows FROM   sysindexes WHERE  id = OBJECT_ID('T2');
    
    RAISERROR ('Waiting for 10 seconds',0,1) WITH NOWAIT;
    
    WAITFOR delay '00:00:10';
    
    SELECT COUNT(*) AS CountMethod FROM   T2 WITH (NOLOCK);
    
    SELECT rows FROM   sysindexes WHERE  id = OBJECT_ID('T2');
    
    RAISERROR ('Waiting to drop table',0,1) WITH NOWAIT
    
    DROP TABLE T2 
    

    Connection 2

    use tempdb;
    
    --Insert 2000 * 2000 = 4 million rows
    WITH T
         AS (SELECT TOP 2000 'x' AS x
             FROM   master..spt_values)
    INSERT INTO T2
                (F)
    SELECT 'X'
    FROM   T v1
           CROSS JOIN T v2
    OPTION (MAXDOP 1) 
    

    Example Results - Showing row count increasing

    enter image description here

    SELECT queries with NOLOCK allow dirty reads. They don't actually take no locks and can still be blocked, they still need a SCH-S (schema stability) lock on the table (and on a heap it will also take a hobt lock).

    The only thing incompatible with a SCH-S is a SCH-M (schema modification) lock. Presumably you also performed some DDL on the table in the same transaction (e.g. perhaps created it in the same tran)

    For the use case of a large insert, where an approximate in flight result is fine, I generally just poll sysindexes as shown above to retrieve the count from metadata rather than actually counting the rows (non deprecated alternative DMVs are available)

    When an insert has a wide update plan you can even see it inserting to the various indexes in turn that way.

    If the table is created inside the inserting transaction this sysindexes query will still block though as the OBJECT_ID function won't return a result based on uncommitted data regardless of the isolation level in effect. It's sometimes possible to get around that by getting the object_id from sys.tables with nolock instead.