Search code examples
splunksplunk-querysplunk-formula

Splunk query reference field in joined data


Full disclosure, I am very new Splunk so I may explain my question incorrectly.

I have two data sources and was given a query to pull data from them individually. I am trying to join this data together so I can create some type of chart, but I am unsure of this would be a join/search etc.

My initial query is as follows:

This allows me to search through the mail logs by sender address and show all emails with a bcSendAction=1, which is a successful send.

index=mail sourcetype=barracuda [search index=mail sourcetype=barracuda bcSender="[email protected]" | table bcMsgId] bcSendAction=1

The result of this search is as follows:

enter image description here

Now, my other search is a log that shows all of the sender email addresses during a certain time period. I would like to use the result of this (the email value) in the first search so that I don't have to hard-code the bcSender, but rather have it use the results from the other source.

// Returns an email address
index=mail sourcetype=sendmail_syslog *@sfdc.net |  
rex field=from "<(?<from>.*)>" | 
table from | dedup from

I was able to parse the log and pull out just the email addresses that I want to use to plug into my first search.

I followed a few emails and tutorials, but a lot of the joins I was seeing only used two different sources/datasets and didn't use the search as I did in my first query.

My attempt at this was something like:

index=mail sourcetype=sendmail_syslog *@sfdc.net 
|  rex field=from "<(?<from>.*)>" 
| table from | dedup from 
|  join from 
    [search index=mail sourcetype=barracuda [search index=mail sourcetype=barracuda bcSender=from | table  bcMsgId] bcSendAction=1]

I don't know that I am referencing the email from the first result set correctly. Can someone point me in the right direction with how to approach this search?


Solution

  • If I understand your request properly, then you need 3 steps:

    1. get the sender addresses from index=mail sourcetype=sendmail_syslog
    2. use these sender addresses to get a list of messageID's from index=mail sourcetype=barracuda
    3. use these messageID's to finally get the events you are looking for

    This sounds like you need a subsearch (for getting the sender addresses) inside of another subsearch (for getting the messageID's), meaning your own attempt was pointing in the right direction already.

    Try something along these lines:

    index=mail sourcetype=barracuda bcSendAction=1
      [ search
        index=mail sourcetype=barracuda
          [ search
            index=mail sourcetype=sendmail_syslog *@sfdc.net 
            | rex field=from "<(?<bcSender>.*)>" 
            | stats count by bcSender
            | fields bcSender
            | format
          ]
        | stats count by bcMsgId
        | fields bcMsgId
        | format
      ]
    

    I can not really verify it without having your data, but I'll try to explain what it's supposed to do. Let's start from the innermost subsearch.

    • Line 4 starts the innermost subsearch
    • Line 5 selects the events in from which you generate the address list
    • Line 6 extracts the addresses directly into the field bcSender. (We could extract it to the field from first and then rename it, but this is more direct.) We need the fieldname to be bcSender for the outer search.
    • Line 7 is a different way to deduplicate by bcSender and at the same time reduce the amount of data which needs to be sent back from indexers to the searchhead (if you have a distributed environment).
    • Line 8 gets rid of all the fields we don't require. They would be problematic with the following format command.
    • Line 9 passes the results back to he enclosing search in a way so it can be used as part of the search string.
    • Line 10, of course, closes the innermost subsearch.

    Now let's have a look at the outer subsearch.

    • Line 2 starts the subsearch.
    • Line 3 selects the events from which we can get the messageID's. This is, of cause, augmented by the enclosed subsearch we've just discussed.
    • Line 11 again is a way to dedup the messageID's.
    • Line 12 again limits things to the field we need.
    • Line 13 passes the found messageID's to the outermost (main) search in a such a way that they become part of the search string.
    • Line 14, you already know, closes the subsearch.

    And the outermost search:

    • Line 1 selects the data you are targetting and is augmented by what the subsearches pass to it.