Search code examples
splunksplunk-query

Splunk query filter out based on other event in same index


I have a index named Events

It contains a bunch of different events, all events have a property called EventName. Now I want to do a query where I return everything that matches the following:

IF AccountId exists in event with EventName AccountCreated AND there is at least 1 event with EventName FavoriteCreated with the same AccountId -> return all events where EventName == AccountCreated

Example events:

AccountCreated

{
   "AccountId": 1234,
   "EventName": "AccountCreated",
   "SomeOtherProperty": "Some value",
   "Brand": "My Brand",
   "DeviceType": "Mobile",
   "EventTime": "2020-06-01T12:13:14Z"
}

FavoriteCreated

{
   "AccountId": 1234,
   "EventName": "FavoritesCreated,
   "Brand": "My Brand",
   "DeviceType": "Mobile",
   "EventTime": "2020-06-01T12:13:14Z"
}

Given the following two events, I would like to create 1 query that returns the AccountCreated event.

I've tried the following but it does not work, surely I must be missing something simple?

index=events EventName=AccountCreated 
  [search index=events EventName=FavoriteCreated | dedup AccountId | fields AccountId]
| table AccountId, SomeOtherProperty

Im expecting ~6000 hits here but Im only getting 2298 events. What am I missing?

UPDATE Based on the answer given by @warren below, the following query works. The only problem is that it's using a JOIN which limits us to 50K results from the subsearch. When running this query I get 5900 results in total = Correct.

index=events EventName=AccountCreated AccountId=*
| stats count by AccountId, EventName
| fields - count
| join AccountId 
    [ | search index=events EventName=FavoriteCreated AccountId=*
    | stats count by AccountId ]
| fields - count
| table AccountId, EventName

I then tried to use his updated example like this but the problem seems to be that it returns FavoriteCreated events instead of AccountCreated. When running this query I get 25 494 hits = Incorrect.

index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats values(EventName) as EventName by AccountId
| eval EventName=mvindex(EventName,-1)
| search EventName="FavoriteCreated"
| table AccountId, EventName

Update 2 - WORKING @warren is awesome, here is a full working query that only returns data from the AccountCreated events IF 1 or more FavoriteCreated event exists.

index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats 
    values(Brand) as Brand,
    values(DeviceType) as DeviceType,
    values(Email) as Email,
    values(EventName) as EventName
    values(EventTime) as EventTime,
    values(Locale) as Locale,
    values(ClientIp) as ClientIp
  by AccountId
| where mvcount(EventName)>1
| eval EventName=mvindex(EventName,0)
| eval EventTime=mvindex(EventTime,0)
| eval ClientIp=mvindex(ClientIp,0)
| eval DeviceType=mvindex(DeviceType,0)

Solution

  • You found, perhaps, one factor of your issues - that subsearches are capped at 50,000 (when doing a join) events (or 60 seconds run time (or 10,000 results when you use a "normal" subsearch)).

    Start by dumping dedup in favor of stats:

    index=events EventName=AccountCreated AccountId=*
    | stats count by AccountId, SomeOtherProperty [, more, fields, as, desired]
    | fields - count
    | search 
        [ | search index=events EventName=FavoriteCreated AccountId=*
        | stats count by AccountId 
        | fields - count]
    <rest of search>
    

    If that doesn't get you where you want to be (ie, you still have too many results in your subsearch), you can try join:

    index=events EventName=AccountCreated AccountId=*
    | stats count by AccountId, SomeOtherProperty [, more, fields, as, desired]
    | fields - count
    | join AccountId 
        [ | search index=events EventName=FavoriteCreated AccountId=*
        | stats count by AccountId ]
    | fields - count
    <rest of search>
    

    There are yet more ways of doing what you're looking for - but these two should get you a long ways toward your goal

    Here's a join-less approach which'll show only "FavoriteCreated" events:

    index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
    | stats values(EventName) as EventName values(SomeOtherProperty) as SomeOtherProperty by AccountId
    | eval EventName=mvindex(EventName,-1)
    | search EventName="FavoriteCreated"
    

    And here's one that shows "FavoriteCreated" only if there was also an "AccountCreated" event in the same timeframe:

    index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
    | stats values(EventName) as EventName values(SomeOtherProperty) as SomeOtherProperty by AccountId
    | where mvcount(EventName)>1
    

    And if you want to 'pretend' the values() didn't happen (ie, throw-out the "favoriteCreated" entry), add this:

    | eval EventName=mvindex(EventName,0)