Search code examples
splunk

Extracting a count from raw splunk data by id


I am trying to get a count from transactional information that is retained within raw data in splunk. I have 3-5 transactions that occur.

One has raw data stating: pin match for id 12345678-1234-1234-abcd-12345678abcd or pin mismatched for id etc.

I'm trying to count the number of times the pin match occurs within the transaction time window of 180sec.

I was trying to do something like: |eval raw=_raw |search index=transa |eval pinc= if((raw like "%pin match%"),1,0) |stats count(pinc) as Pincount by ID

The issue I'm having is it is counting cumulatively over whatever time I am looking at those transactions. Is there a way to attach it to the ID that is within the message or have it count every one that occurs within that time window?

Thanks!


Solution

  • Presuming the pin status and ID have not been extracted:

    index=ndx sourcetype=srctp "pin" "match" OR "mismatched"
    | rex field=_raw "pin (?<pin_status>\w+)"
    | rex field=_raw "id (?<id>\S+)"
    | eval status_time=pin_status+"|"+_time
    | stats earliest(status_time) as beginning latest(status_time) as ending by id
    | eval beginning=split(beginning,"|"), ending=split(ending,"|")
    | eval begining=mvindex(beginning,-1), ending=mvindex(ending,-1)
    | table id beginning ending
    | sort 0 id
    | eval beginning=strftime(beginning,"%c"), ending=strftime(ending,"%c")
    

    After extracting the status ("match" or "mismatched") and the id, append the individual event's _time to the end of the status - we'll pull that value back out after statsing

    Using stats, find the earliest and latest status_time entries (fields just created on the previous line) by id, saving them into new fields beginning and ending

    Next, split() beginning and ending on the pipe we added to separate the status from the timestamp into a multivalue field

    Then assign the last item from the multivalue field (which we know is the timestamp) into itself (because we know that the earliest entry for a status_time should always be "match", and the latest entry for a status_time should always be "mismatched")

    Lastly, table the id and time stamps, sort by id, and format the timestamp into something human readable (strftime takes many arguments, %c just happens to be quick)