Search code examples
sqlsql-serverxmlextended-events

SQL importing Extended Events file using sys.fn_xe_file_target_read_file how to only get values since last import


I am using SQL Server 2012

I have a long running extended event (runs for days to capture events) that saves to a .xel file.

I have a job that runs periodically to import the data into a staging table. I am only importing the XML event_data column from the file so I can parse out the XML fields I need and save to a table for reporting.

I know when the last time I ran the import was so I want to see if I can only select records from the file that were added since the import process last ran.

I have it working now but it imports ALL the records from the files into staging tables, parses out the fields I need (including timestamp), then only imports the records that have a timestamp since the job last ran.

My process only inserts new ones since the last time the job ran so this all works fine but it does a lot of work importing and parsing out the XML for ALL records in the file, including the ones I already imported the last times the job ran.

So I want to find a way to not import from the file at all if it was already imported, or at least not have to parse the XML for the records that were already imported (though I have to parse it now to get the timestamp to exclude the ones already processed).

Below is what I have, and as I said, it works, but is doing a lot of extra work if I can find a way to skip the ones I already imported.

I only included the steps for my process that I need the help on:

-- pull data from file path and insert into staging table
INSERT INTO #CaptureObjectUsageFileData (event_data)
SELECT cast(event_data as XML) as event_data
FROM sys.fn_xe_file_target_read_file(@FilePathNameToImport, null, null, null)


-- parse out the data needed (only columns using) and insert into temp table for parsed data
INSERT INTO #CaptureObjectUsageEventData (EventTime, EventObjectType, EventObjectName)
SELECT n.value('(@timestamp)[1]', 'datetime') AS [utc_timestamp],
n.value('(data[@name="object_type"]/text)[1]', 'varchar(500)') AS ObjectType,
n.value('(data[@name="object_name"]/value)[1]', 'varchar(500)') as ObjectName
from (
    SELECT event_data
    FROM #CaptureObjectUsageFileData (NOLOCK)
) ed
CROSS apply ed.event_data.nodes('event') as q(n)


-- select from temp table as another step for speed/conversion
--  converting the timestamp to smalldatetime so it doesnt get miliseconds so when we select distinct it wont have lots of dupes
INSERT INTO DBALocal.dbo.DBObjectUsageTracking(DatabaseID, ObjectType, ObjectName, ObjectUsageDateTime)
SELECT DISTINCT @DBID, EventObjectType, EventObjectName, CAST(EventTime AS SMALLDATETIME)
FROM #CaptureObjectUsageEventData
WHERE EventTime > @LastRunDateTime

Solution

  • Okay, I've place a comment already, but - after thinking a bit deeper and looking into your code - this might be rather simple:

    You can store the time of your last import and use a predicate in .nodes() (like you do this in .value() to get the correct <data>-element).

    Try something like this:

    DECLARE @LastImport DATETIME=GETDATE(); --put the last import's time here
    
    and then
    
    CROSS apply ed.event_data.nodes('event[@timestamp cast as xs:dateTime? > sql:variable("@LastImport")]') as q(n)
    

    Doing so, .nodes() should return only <event>-elements, where the condition is fullfilled. If this does not help, please show some reduced example of the XML and what you want to get.