Search code examples
sql-servert-sqlparsingexpressionstatements

SQL Server: How to parse code into its different statements


As a possible sotlution to SQL Server: How to find what lines are executed I proposed that I could add inserts statements after each statement.

What would be an efficient way to split up code of a stored procedure into its different statements so that after each statement an extra insert statement with the previous line can be added? If the exact same line occurs more then once inside the procedure/function/trigger they also need to be made unique with some number.

Comments and styling do not have to be taken into consideration. But it is important that exact execution flow can be followed

Example input 1:

/*******************************************************************************************
    description
    @param wioho
*******************************************************************************************/
CREATE PROC usp_Example1
    (
        @param VARCHAR(MAX),
        @param2 INT
    )
AS
BEGIN
    BEGIN TRY
        -- random comment
        INSERT INTO dept VALUES (@param, @param2)
        IF EXISTS (
                SELECT 1
                    FROM dept 
                    WHERE deptno = 10
            )
            THROW 50001, 'Dept 10 exists', 1
        ELSE
            INSERT INTO dept VALUES (@param, @param2)
    END TRY
    BEGIN CATCH
        THROW
    END CATCH
END

Expected output 1 (or functionally equivelant):

/*******************************************************************************************
    description
    @param wioho
*******************************************************************************************/
CREATE PROC usp_Example1
    (
        @param VARCHAR(MAX),
        @param2 INT
    )
AS
BEGIN
    BEGIN TRY
        INSERT INTO coverageTrace VALUES ('usp_Example1', 'BEGIN TRY', 1)
        -- random comment
        INSERT INTO dept VALUES (@param, @param2)
        INSERT INTO coverageTrace VALUES ('usp_Example1', 'INSERT INTO dept VALUES (@param, @param2)', 1)
        IF EXISTS (
                SELECT 1
                    FROM dept 
                    WHERE deptno = 10
            )
            BEGIN
                INSERT INTO coverageTrace VALUES ('usp_Example1', 'IF EXISTS (SELECT 1 FROM dept WHERE deptno = 10)', 1)
                THROW 50001, 'Dept 10 exists', 1
            END
        ELSE IF 1 = 1
        BEGIN
            INSERT INTO dept VALUES (@param, @param2)
            INSERT INTO coverageTrace VALUES ('usp_Example1', 'INSERT INTO dept VALUES (@param, @param2)', 2)
        END
    END TRY
    BEGIN CATCH
        INSERT INTO coverageTrace VALUES ('usp_Example1', 'BEGIN CATCH', 1)
        THROW
    END CATCH
END

Now if someone does not properly style their code this should still work. Example input 2:

/*******************************************************************************************
    description @param wioho
*******************************************************************************************/
CREATE PROC usp_Example1(@param VARCHAR(MAX),@param2 INT) AS BEGIN
    BEGIN TRY-- random comment
INSERT INTO dept VALUES (@param, @param2) IF EXISTS (
                SELECT 1
                    FROM dept 
                    WHERE deptno = 10
            )
            THROW 50001, 'Dept 10 exists', 1 ELSE
            INSERT INTO dept VALUES (@param, @param2) END TRY BEGIN CATCH
        THROW
    END CATCH
END

This should give the (functionally) equivelant code to expected output 1

Please note that this code in case of a block statements needs to be able to know whether or not BEGIN and END where explicitly used. So the code can add it explicitly if it is needed.

Is there any code available that can be reused or perhaps regexes I can use. If possible I would like to do this within SQL so my mutation testing framework can be one file on any MS SQL Server.

Please note that: this is a testing framework and manually changing code is not an option, this has to be done automaticly.

Progress update: After @Jeroen Mostert comment I started experimenting with the extended event system. I still have a couple of problems to tackle, how do you properly filter the generated XML and how do you only trace in the database without hardcoding in the database name?(Fixed by code generation (didn't release I needed to use a wide character set inside generation))

Current code:

    USE master
    GO

    DROP DATABASE IF EXISTS testMSSQLDB
    GO

    CREATE DATABASE testMSSQLDB
    GO

    USE testMSSQLDB
    GO

    CREATE TYPE ID FROM INT
    GO

    CREATE TABLE dept (
        deptno ID PRIMARY KEY
    )
    GO

    IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='testMSSQLTrace')  
       DROP EVENT SESSION testMSSQLTrace ON SERVER;  

    DECLARE @cmd VARCHAR(MAX) = '';
SELECT @cmd = 'CREATE EVENT SESSION testMSSQLTrace 
ON SERVER
    ADD EVENT sqlserver.module_end
    (SET collect_statement = (1)
        WHERE (sqlserver.database_name = N''' + DB_NAME() + ''')),
    --ADD EVENT sqlserver.rpc_completed
    --(WHERE (sqlserver.database_name = N''' + DB_NAME() + ''')),
    ADD EVENT sqlserver.sp_statement_completed
        (WHERE (sqlserver.database_name = N''' + DB_NAME() + ''')),
    --ADD EVENT sqlserver.sql_batch_completed
    --(WHERE (sqlserver.database_name = N''' + DB_NAME() + ''')),
    ADD EVENT sqlserver.sql_statement_completed
        (WHERE (sqlserver.database_name = N''' + DB_NAME() + '''))
    ADD TARGET package0.ring_buffer
        WITH (
            MAX_MEMORY = 2048 KB,
            -- EVENT_RETENTION_MODE = NO_EVENT_LOSS,
            MAX_DISPATCH_LATENCY = 3 SECONDS,
            MAX_EVENT_SIZE = 0 KB,
            MEMORY_PARTITION_MODE = NONE,
            TRACK_CAUSALITY = OFF,
            STARTUP_STATE = OFF
        );'

EXEC (@cmd)

    ALTER EVENT SESSION testMSSQLTrace
          ON SERVER
        STATE = STOP; 

    ALTER EVENT SESSION testMSSQLTrace
          ON SERVER
        STATE = START;  

    GO

    CREATE OR ALTER PROC usp_temp
        (
            @param INT = 10 
        )
    AS
    BEGIN
        IF @param = 10
        BEGIN
            DELETE dept
            INSERT INTO dept VALUES (@param)
            SELECT * FROM dept
        END
        ELSE
            DELETE dept
    END
    GO

    EXEC usp_temp
    EXEC usp_temp 20

    SELECT name, target_name, CAST(xet.target_data AS xml)
    FROM sys.dm_xe_session_targets AS xet  
    JOIN sys.dm_xe_sessions AS xe  
       ON (xe.address = xet.event_session_address)  
    WHERE xe.name = 'testMSSQLTrace'

This generates (cut out some parts):

<RingBufferTarget truncated="0" processingTime="0" totalEventsProcessed="12" eventCount="12" droppedCount="0" memoryUsed="2012">
<event name="sp_statement_completed" package="sqlserver" timestamp="2019-07-04T09:22:30.472Z">
    <data name="source_database_id">
      <type name="uint32" package="package0" />
      <value>22</value>
    </data>
    <data name="object_id">
      <type name="int32" package="package0" />
      <value>1916742081</value>
    </data>
    <data name="object_type">
      <type name="object_type" package="sqlserver" />
      <value>8272</value>
      <text>PROC</text>
    </data>
    <data name="duration">
      <type name="int64" package="package0" />
      <value>22</value>
    </data>
    <data name="cpu_time">
      <type name="uint64" package="package0" />
      <value>0</value>
    </data>
    <data name="physical_reads">
      <type name="uint64" package="package0" />
      <value>0</value>
    </data>
    <data name="logical_reads">
      <type name="uint64" package="package0" />
      <value>3</value>
    </data>
    <data name="writes">
      <type name="uint64" package="package0" />
      <value>0</value>
    </data>
    <data name="row_count">
      <type name="uint64" package="package0" />
      <value>1</value>
    </data>
    <data name="last_row_count">
      <type name="uint64" package="package0" />
      <value>1</value>
    </data>
    <data name="nest_level">
      <type name="uint16" package="package0" />
      <value>1</value>
    </data>
    <data name="line_number">
      <type name="int32" package="package0" />
      <value>11</value>
    </data>
    <data name="offset">
      <type name="int32" package="package0" />
      <value>214</value>
    </data>
    <data name="offset_end">
      <type name="int32" package="package0" />
      <value>276</value>
    </data>
    <data name="object_name">
      <type name="unicode_string" package="package0" />
      <value />
    </data>
    <data name="statement">
      <type name="unicode_string" package="package0" />
      <value>INSERT INTO dept VALUES (@param)</value>
    </data>
  </event>
  <event name="sp_statement_completed" package="sqlserver" timestamp="2019-07-04T09:22:30.476Z">
    <data name="source_database_id">
      <type name="uint32" package="package0" />
      <value>22</value>
    </data>
    <data name="object_id">
      <type name="int32" package="package0" />
      <value>1916742081</value>
    </data>
    <data name="object_type">
      <type name="object_type" package="sqlserver" />
      <value>8272</value>
      <text>PROC</text>
    </data>
    <data name="duration">
      <type name="int64" package="package0" />
      <value>32</value>
    </data>
    <data name="cpu_time">
      <type name="uint64" package="package0" />
      <value>0</value>
    </data>
    <data name="physical_reads">
      <type name="uint64" package="package0" />
      <value>0</value>
    </data>
    <data name="logical_reads">
      <type name="uint64" package="package0" />
      <value>2</value>
    </data>
    <data name="writes">
      <type name="uint64" package="package0" />
      <value>0</value>
    </data>
    <data name="row_count">
      <type name="uint64" package="package0" />
      <value>1</value>
    </data>
    <data name="last_row_count">
      <type name="uint64" package="package0" />
      <value>1</value>
    </data>
    <data name="nest_level">
      <type name="uint16" package="package0" />
      <value>1</value>
    </data>
    <data name="line_number">
      <type name="int32" package="package0" />
      <value>12</value>
    </data>
    <data name="offset">
      <type name="int32" package="package0" />
      <value>286</value>
    </data>
    <data name="offset_end">
      <type name="int32" package="package0" />
      <value>320</value>
    </data>
    <data name="object_name">
      <type name="unicode_string" package="package0" />
      <value />
    </data>
    <data name="statement">
      <type name="unicode_string" package="package0" />
      <value>SELECT * FROM dept</value>
    </data>
  </event>
</RingBufferTarget>

How can I filter this XML in a way that only the executed statement the object type and object id from where it was executed remain? The concrete information need is that I need to know what lines of a stored procedure where executed, a stored procedure can call other stored procedure, in this case I still need to know what statements the procedure executed and that it was nested in the first stored procedure. And if the same statement occurs multiple times I need to now its (relative) line number

Or in predicates: Procedure X in the toplevel stored procedure Y executed line Z with linenumber J

Procedure X in the toplevel stored procedure Y executed line W with linenumber I

EDIT: I did some more research and I concluded that I need all events that have a <data name="nest_level"><value>2</value></data> field. Where the 2 is any value greater than 1.

This https://www.scarydba.com/2018/09/24/extended-events-and-stored-procedure-parameter-values/ link proved to be helpful for me to get all the data.


Solution

  • So the extended events are the solution, this is how I have done it:

    IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='testMSSQLTrace')  
       DROP EVENT SESSION testMSSQLTrace ON SERVER;  
    
    DECLARE @cmd VARCHAR(MAX) = '';
    SELECT @cmd = 'CREATE EVENT SESSION testMSSQLTrace 
    ON SERVER
        ADD EVENT sqlserver.sp_statement_completed
            (WHERE (sqlserver.database_name = N''' + DB_NAME() + '''))
        ADD TARGET package0.ring_buffer
            WITH (
                MAX_MEMORY = 2048 KB,
                EVENT_RETENTION_MODE = NO_EVENT_LOSS,
                MAX_DISPATCH_LATENCY = 3 SECONDS,
                MAX_EVENT_SIZE = 0 KB,
                MEMORY_PARTITION_MODE = NONE,
                TRACK_CAUSALITY = OFF,
                STARTUP_STATE = OFF
            );'
    
    EXEC (@cmd)
    

    This creates an event that can be fired after every statement completion, this is done dynamicly to filter on the database

    Then I have 3 procedures that make controlling this event easy

    /*******************************************************************************************
        Starts the statement trace
    *******************************************************************************************/
    CREATE OR ALTER PROC testMSSQL.Private_StartTrace
    AS
    BEGIN 
        ALTER EVENT SESSION testMSSQLTrace
              ON SERVER
            STATE = START; 
    END
    GO
    
    /*******************************************************************************************
        Ends the statement trace, this also clears the trace
    *******************************************************************************************/
    CREATE OR ALTER PROC testMSSQL.Private_StopTrace
    AS
    BEGIN
        ALTER EVENT SESSION testMSSQLTrace
              ON SERVER
            STATE = STOP; 
    END
    GO
    
    
    /*******************************************************************************************
        Saves the statements trace
    *******************************************************************************************/
    CREATE OR ALTER PROC testMSSQL.Private_SaveTrace
    AS
    BEGIN
        DECLARE @xml XML;
    
        SELECT @xml = CAST(xet.target_data AS xml)
            FROM sys.dm_xe_session_targets AS xet INNER JOIN sys.dm_xe_sessions AS xe ON (xe.address = xet.event_session_address)  
            WHERE xe.name = 'testMSSQLTrace'  
    
        INSERT INTO testMSSQL.StatementInvocations (testProcedure, procedureName, lineNumber, statement)
            SELECT testMSSQL.GetCurrentTest(), 
                OBJECT_NAME(T.c.value('(data[@name="object_id"]/value)[1]', 'int')),
                T.c.value('(data[@name="line_number"]/value)[1]', 'int'), 
                T.c.value('(data[@name="statement"]/value)[1]', 'VARCHAR(900)')
            FROM @xml.nodes('RingBufferTarget/event') T(c)
            WHERE T.c.value('(data[@name="nest_level"]/value)[1]', 'int') > 3
    
    END
    GO
    

    These procedures respectivly start and stop the trace and the last one stores the result in a table where it filters on the nest level so my own code is not traced.

    Finally I use it a bit like this:

    start trace
    start tran/savepoint
    run SetUp (users code)
    run test (users code)
    save trace
    save trace to variable
    rollback tran (also catch errors and stuff like that)
    save variable back to table so the trace is not rolled back