Search code examples
sqlsql-serverbatch-query

How to pull data from SQL Server database using a stored procedure


I am tasked to create a stored procedure for a publisher application to retrieve employee data from our SQL Server.

The aim: We have EmpTable (lets call this the source). We also have EmpData table. It has a column called 'Status' with a default value: 'UNPROCESSED'. Our aim is to create a SP(lets call this SP1) so that: it polls all the data from the EmpTable in batches of 100 rows. It should then continue polling data from this table until SP returns zero records. Once the above completed processing, another SP(lets call this SP2) is created/called to change the status to 'COMPLETED'. If the record processing is failed permanently in the app, that is due to errors such as schema or validation errors (none retryable), SP2 to change the status to FAILED. The results is then populated in the EmpData table We aim to run this batch job once a day. Hope this make sense

I am wondering how this can be queried. I started the query:

DECLARE @id_check INT
DECLARE @batchSize INT
DECLARE @results INT

SET @results = 1 --stores the row count after each successful batch
SET @batchSize = 100 --How many rows you want to operate on each batch
SET @id_check = 0 --current batch 

-- when 0 rows returned, exit the loop
WHILE (@results > 0) 
BEGIN
   SELECT * -- This is just an example to generalize result for now
   FROM empdata

   SET @results = @@ROWCOUNT

   -- next batch
   SET @id_check = @id_check + @batchSize
END

The result I am aiming for is to return batch 1 to return 100 values, then batch 2 to return the next 100 and so on

Any help would be appreciated!


Solution

  • Unfortunately without clear requirements its hard to assist you.

    However, if what you want is to pull all the matching records, but in batches (which doesn't make a lot of sense), then you can use the following stored procedure.

    It will return the first 100 rows which meet whatever criteria you have. Then when they are loaded in your app, you call the SP again, passing in the maximum ID you received in the previous recordset.

    Your app continues to call this SP until no rows are returned.

    CREATE OR ALTER PROCEDURE dbo.MyTestProcedure1
    (
        -- Pass in the last ID processed
        @LastIdChecked int = 0
    )
    AS
    BEGIN
        SET NOCOUNT, XACT_ABORT ON;
    
        SET @LastIdChecked = COALESCE(@LastIdChecked,0);
    
        DECLARE @BatchSize int = 100;
    
        -- Return the next @BatchSize records after the last Id checked
        SELECT TOP(@BatchSize) *
        FROM dbo.EmpTable
        WHERE Id > @LastIdChecked
        ORDER BY Id ASC;
    
        RETURN 0;
    END;
    

    A more expected process would be, you use the SP to pull your first 100 records, then you fully process them in your app. Then you call the SP again, and the SP knows which records have been processed and filters them out in the WHERE clause. Then you run this until completion.

    That solution would look like this:

    CREATE OR ALTER PROCEDURE dbo.MyTestProcedure2
    AS
    BEGIN
        SET NOCOUNT, XACT_ABORT ON;
    
        DECLARE @BatchSize int = 100;
    
        -- Return the next @BatchSize records
        SELECT TOP(@BatchSize) *
        FROM dbo.EmpTable T
        WHERE {Record is unprocessed}
        -- e.g. if you are expeceting 1 record a day per dbo.EmpTable record.
        -- WHERE NOT EXISTS (
        --    SELECT 1
        --    FROM dbo.EmpData D
        --    WHERE T.EmpId = D.EmpId AND D.Date = CONVERT(date, GETDATE())
        --)
        ORDER BY Id ASC;
    
        RETURN 0;
    END;