Assume if I type index=endpoints
and hit enter in the search bar, I will get results that look something like this:
{
"user": Jack,
"os_name": "Windows",
"hostname": "Windows-JACK-01",
"pid": "30219",
"app": "/usr/bin/curl",
"cmdline": "curl google.com",
"epoch": "1503452096",
"type": "processes"
}
. . .
{
"hostname": "Windows-JACK-01",
"pid": "30219",
"app": "/usr/bin/curl",
"epoch": "1503452096",
"ip": "123.123.123.123",
"port": "1234",
"type": "sockets"
}
. . .
There are two types of data under the same index - sockets and processes. I would like to find a way to combine these two types of data (that are related) so that I could get a richer data that has all the information.
+-------------------------+-----------------+---------+
| hostname | pid | app | osname | ip | port | etc.... |
+-------------------------+-----------------+---------+
| ... | ... | ... | ... | x | y | ... |
+-------------------------+-----------------+---------+
The problem is if I simply do something like:
index=endpoints type="processes"
| join left=L right=L WHERE L.pid=R.pid [ search index=endpoints type="sockets" ]
Most of the time, I will get a wrong mapping between the app
and the pid
because any applications can get assigned to same pid
when it's available.
I'm thinking maybe if I add more conditions, it will reduce the inaccuracy rate. For example, instead of L.pid=R.pid
only, maybe I could do L.pid=R.pid AND L.hostname=R.hostname AND ...
My naive approach was to add more condition in the WHERE part
index=endpoints type="processes"
| join left=L right=L WHERE (L.pid=R.pid AND L.x=R.x AND...)
[ search index=endpoints type="sockets" ]
However, it seems like that's not how it works. Any advice?
From your example queries I guess you are an experienced SQL user who is new to Splunk and hasn't read the manual about the join
command. join
does not accept a where
clause nor does it have left
or right
options. As a best practice, one should avoid join
as much as possible since it is very inefficient.
Try using stats
, instead. We use stats
for its grouping feature rather than to calculate statistics.
index=endpoints (type="processes" OR type="sockets")
| stats values(*) as * by hostname, pid