Search code examples
pythonxmlcsvevent-log

Parsing Eventlogs XML file to CSV in Python


I am working on a simple python script to extract certain data from an xml file. The xml contains windows events and eventid. Below I am showing the code. It is failing when it needs to extract the data, but it is creating the file but is empty.

from xml.etree import ElementTree as ET
import csv

tree = ET.parse("SecurityLog-rev2.xml")
root = tree.getroot() 

url = root[0].tag[:-len("Event")]
fieldnames = ['EventID']

with open ('event_log.csv', 'w') as csvfile:
    writecsv = csv.DictWriter(csvfile, fieldnames = fieldnames)
    writecsv.writeheader()

    for event in root:
        system = event.find(url + "System")
        output = {}
        fields = ['EventID']
   # for tag,att in fields:
   #     output[tag] = system.find(url + tag).attrib[att]

        if event.find(url + "EventData") != None:
            for data in event.find(url + "EventData"):
                name = data.attrib['Name']
                output[name] = data.text

        writecsv.writerow(output)
<Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'><System><Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-A5BA-3E3B0328C30D}'/>
<EventID>4634</EventID>
<Version>0</Version><Level>0</Level><Task>12545</Task><Opcode>0</Opcode><Keywords>0x8020000000000000</Keywords><TimeCreated SystemTime='2011-04-16T15:07:53.890625000Z'/>
<EventRecordID>1410962</EventRecordID><Correlation/><Execution ProcessID='452' ThreadID='3900'/><Channel>Security</Channel><Computer>DC01.AFC.com</Computer><Security/></System>
<EventData><Data Name='TargetUserSid'>S-1-5-21-2795111079-3225111112-3329435632-1610</Data>
<Data Name='TargetUserName'>grant.larson</Data>
<Data Name='TargetDomainName'>AFC</Data><Data Name='TargetLogonId'>0x3642df8</Data><Data Name='LogonType'>3</Data></EventData></Event>

Solution

  • I am not sure what exactly you would parse. Here is a solution for the Id and the events:

    Your XML File provided above as Input:

    <?xml version="1.0" encoding="utf-8"?>
    <Event xmlns='http://schemas.microsoft.com/win/2004/08/events/event'>
      <System>
        <Provider Name='Microsoft-Windows-Security-Auditing' Guid='{54849625-5478-4994-A5BA-3E3B0328C30D}' />
        <EventID>4634</EventID>
        <Version>0</Version>
        <Level>0</Level>
        <Task>12545</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8020000000000000</Keywords>
        <TimeCreated SystemTime='2011-04-16T15:07:53.890625000Z' />
        <EventRecordID>1410962</EventRecordID>
        <Correlation />
        <Execution ProcessID='452' ThreadID='3900' />
        <Channel>Security</Channel>
        <Computer>DC01.AFC.com</Computer>
        <Security />
      </System>
      <EventData>
        <Data Name='TargetUserSid'>S-1-5-21-2795111079-3225111112-3329435632-1610</Data>
        <Data Name='TargetUserName'>grant.larson</Data>
        <Data Name='TargetDomainName'>AFC</Data>
        <Data Name='TargetLogonId'>0x3642df8</Data>
        <Data Name='LogonType'>3</Data>
      </EventData>
    </Event>
    

    The program code without regex for catching the namespace:

    from xml.etree import ElementTree as ET
    import pandas as pd
    import csv
    
    tree = ET.parse("SecurityLog-rev2.xml")
    root = tree.getroot()
    ns = "{http://schemas.microsoft.com/win/2004/08/events/event}"
    
    data = []
    for eventID in root.findall(".//"):
        
        if eventID.tag == f"{ns}System":
            for e_id in eventID.iter():
                if e_id.tag == f'{ns}EventID':
                    row =  "EventID", e_id.text
                    data.append(row)
        
        if eventID.tag == f"{ns}EventData":
            for attr in eventID.iter():
                if attr.tag == f'{ns}Data':
                    #print(attr.attrib)
                    row = attr.get('Name'), attr.text
                    data.append(row)
                    
    df = pd.DataFrame.from_dict(data, orient='columns')
    df.to_csv('event_log.csv', index=False, header=False)
    print(df)
    

    Output:

                      0                                               1
    0           EventID                                            4634
    1     TargetUserSid  S-1-5-21-2795111079-3225111112-3329435632-1610
    2    TargetUserName                                    grant.larson
    3  TargetDomainName                                             AFC
    4     TargetLogonId                                       0x3642df8
    5         LogonType                                               3
    

    The CSV File doesn't contain the index and header:

    EventID,4634
    TargetUserSid,S-1-5-21-2795111079-3225111112-3329435632-1610
    TargetUserName,grant.larson
    TargetDomainName,AFC
    TargetLogonId,0x3642df8
    LogonType,3 
    

    You can tanspose() the output:

    df.T.to_csv('event_log.csv', index=False, header=False)