Search code examples
sql-serverxmleventsevent-log

How can I import and query an XML file (in this case, a business process event log) into SQL Server Express?


To keep it short, I have this ".xes" (Extensible Event Stream) file, which is in fact an XML, and looks like this (this code only shows an example of an event - the file contains multiple events similar to this one):

<?xml version="1.0" encoding="UTF-8" ?>
<log xes.version="1.0" xes.features="nested-attributes" openxes.version="1.0RC7" xmlns="http://www.xes-standard.org/">
    <trace>
        <string key="concept:name" value="0"/>
        <event>
            <string key="org:resource" value="Call Centre Agent"/>
            <date key="time:timestamp" value="2006-01-01T00:00:00.000+01:00"/>
            <string key="concept:name" value="check if sufficient information is available"/>
            <string key="lifecycle:transition" value="start"/>
        </event>
     </trace>
...

This file represents in fact a business process event log which contains the events of different activities of a process with timestamp and further information.

I need to extract the information from this data and prepare some SQL queries as well.

I am currently using a SQL Server 2014 Express database, and having trouble importing the data and querying it.


Solution

  • This is a general approach to get a file's content into a variable: This is a general approach to get a file's content into a variable:

    DECLARE @xml XML=
    (SELECT *  FROM OPENROWSET(BULK 'C:\YourPath\XMLFile.xml',SINGLE_CLOB) AS x);
    SELECT @xml;
    

    As this is nested data (with unclear level of nesting...) this is my suggestion:

    DECLARE @log XML=
    '<log xmlns="http://www.xes-standard.org/" xes.version="1.0" xes.features="nested-attributes" openxes.version="1.0RC7">
      <trace>
        <string key="concept:name" value="0" />
        <event>
          <string key="org:resource" value="Call Centre Agent" />
          <date key="time:timestamp" value="2006-01-01T00:00:00.000+01:00" />
          <string key="concept:name" value="check if sufficient information is available" />
          <string key="lifecycle:transition" value="start" />
        </event>
        <event>
          <string key="second-resouce" value="Call Centre Agent" />
          <date key="second:timestamp" value="2006-01-01T00:00:00.000+01:00" />
          <string key="second:name" value="check if sufficient information is available" />
          <string key="second:transition" value="start" />
        </event>
      </trace>
    </log>';
    
    WITH XMLNAMESPACES(DEFAULT 'http://www.xes-standard.org/')
    SELECT TraceNode.value('string[1]/@key','varchar(max)') AS Trace_String_Key
          ,TraceNode.value('string[1]/@value','int') AS Trace_String_Value
          ,EventNode.value('date[1]/@key','varchar(max)') AS Trace_Event_Date_Key
          ,EventNode.value('date[1]/@value','datetime') AS Trace_Event_Date_Value
          ,EventStringNode.value('@key','varchar(max)') AS Trace_Event_String_Key
          ,EventStringNode.value('@value','varchar(max)') AS Trace_Event_String_Value
    
    FROM @log.nodes('/log/trace') AS a(TraceNode)
    OUTER APPLY TraceNode.nodes('event') AS b(EventNode)
    OUTER APPLY EventNode.nodes('string') AS c(EventStringNode)
    

    do you have any suggestions on how, and for what, could I query this data? Some practical examples would be useful

    Well, that's really up to you... If you ask such a question, you should know what you need it for :-)

    One idea:

    Create a relational table structure

    • Table "Log" (Each log file and side data)
    • Table "Event" (Child data to "Log")
    • Table "EventData" (Child data to "Event")

    You can use the query to retrieve your data to insert this into your tables...