I have a index named Events
It contains a bunch of different events, all events have a property called EventName
.
Now I want to do a query where I return everything that matches the following:
IF AccountId exists in event with EventName AccountCreated
AND there is at least 1 event with EventName FavoriteCreated
with the same AccountId -> return all events where EventName == AccountCreated
Example events:
AccountCreated
{
"AccountId": 1234,
"EventName": "AccountCreated",
"SomeOtherProperty": "Some value",
"Brand": "My Brand",
"DeviceType": "Mobile",
"EventTime": "2020-06-01T12:13:14Z"
}
FavoriteCreated
{
"AccountId": 1234,
"EventName": "FavoritesCreated,
"Brand": "My Brand",
"DeviceType": "Mobile",
"EventTime": "2020-06-01T12:13:14Z"
}
Given the following two events, I would like to create 1 query that returns the AccountCreated event.
I've tried the following but it does not work, surely I must be missing something simple?
index=events EventName=AccountCreated
[search index=events EventName=FavoriteCreated | dedup AccountId | fields AccountId]
| table AccountId, SomeOtherProperty
Im expecting ~6000 hits here but Im only getting 2298 events. What am I missing?
UPDATE Based on the answer given by @warren below, the following query works. The only problem is that it's using a JOIN which limits us to 50K results from the subsearch. When running this query I get 5900 results in total = Correct.
index=events EventName=AccountCreated AccountId=*
| stats count by AccountId, EventName
| fields - count
| join AccountId
[ | search index=events EventName=FavoriteCreated AccountId=*
| stats count by AccountId ]
| fields - count
| table AccountId, EventName
I then tried to use his updated example like this but the problem seems to be that it returns FavoriteCreated events instead of AccountCreated. When running this query I get 25 494 hits = Incorrect.
index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats values(EventName) as EventName by AccountId
| eval EventName=mvindex(EventName,-1)
| search EventName="FavoriteCreated"
| table AccountId, EventName
Update 2 - WORKING @warren is awesome, here is a full working query that only returns data from the AccountCreated events IF 1 or more FavoriteCreated event exists.
index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats
values(Brand) as Brand,
values(DeviceType) as DeviceType,
values(Email) as Email,
values(EventName) as EventName
values(EventTime) as EventTime,
values(Locale) as Locale,
values(ClientIp) as ClientIp
by AccountId
| where mvcount(EventName)>1
| eval EventName=mvindex(EventName,0)
| eval EventTime=mvindex(EventTime,0)
| eval ClientIp=mvindex(ClientIp,0)
| eval DeviceType=mvindex(DeviceType,0)
You found, perhaps, one factor of your issues - that subsearches are capped at 50,000 (when doing a join
) events (or 60 seconds run time (or 10,000 results when you use a "normal" subsearch)).
Start by dumping dedup
in favor of stats
:
index=events EventName=AccountCreated AccountId=*
| stats count by AccountId, SomeOtherProperty [, more, fields, as, desired]
| fields - count
| search
[ | search index=events EventName=FavoriteCreated AccountId=*
| stats count by AccountId
| fields - count]
<rest of search>
If that doesn't get you where you want to be (ie, you still have too many results in your subsearch), you can try join
:
index=events EventName=AccountCreated AccountId=*
| stats count by AccountId, SomeOtherProperty [, more, fields, as, desired]
| fields - count
| join AccountId
[ | search index=events EventName=FavoriteCreated AccountId=*
| stats count by AccountId ]
| fields - count
<rest of search>
There are yet more ways of doing what you're looking for - but these two should get you a long ways toward your goal
Here's a join
-less approach which'll show only "FavoriteCreated" events:
index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats values(EventName) as EventName values(SomeOtherProperty) as SomeOtherProperty by AccountId
| eval EventName=mvindex(EventName,-1)
| search EventName="FavoriteCreated"
And here's one that shows "FavoriteCreated" only if there was also an "AccountCreated" event in the same timeframe:
index=events AccountId=* (EventName=AccountCreated OR EventName=FavoriteCreated)
| stats values(EventName) as EventName values(SomeOtherProperty) as SomeOtherProperty by AccountId
| where mvcount(EventName)>1
And if you want to 'pretend' the values()
didn't happen (ie, throw-out the "favoriteCreated" entry), add this:
| eval EventName=mvindex(EventName,0)