Search code examples
splunksplunk-query

Search using Lookup from a single field CSV file


I have a list of usernames that I have to monitor and the list is growing every day. I read Splunk documentation and it seems like lookup is the best way to handle this situation.

The goal is for my query to leverage the lookup function and prints out all the download events from all these users in the list.

Sample logs

index=proxy123 activity="download"

{
"machine":"1.1.1.1",
"username":"[email protected]",
"activity":"download"
}

{
"machine":"2.2.2.2",
"username":"[email protected]",
"activity":"download"
}

{
"machine":"3.3.3.3",
"username":"[email protected]",
"activity":"download"
}

Sample Lookup (username.csv)

users
[email protected]
[email protected]
[email protected]

Current query:

index=proxy123 activity="download" | lookup username.csv users OUTPUT users | where not isnull(users)

Result: 0 (which is not correct)

I probably don't understand lookup correctly. Can someone correct me and teach me the correct way?


Solution

  • In the lookup file, the name of the field is users, whereas in the event, it is username. Fortunately, the lookup command has a mechanism for renaming the fields during the lookup. Try the following

    index=proxy123 activity="download" | lookup username.csv users AS username OUTPUT users | where  isnotnull(users)
    

    Now, depending on the volume of data you have in your index and how much data is being discarded when not matching a username in the CSV, there may be alternate approaches you can try, for example, this one using a subsearch.

    index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS username | return username ]
    

    What happens here in the subsearch (the bit in the []) is that the subsearch will be expanded first, in this case, to (username="[email protected]" OR username="[email protected]" OR username="[email protected]"). So your main search will turn into

    index=proxy123 activity="download" (username="[email protected]" OR username="[email protected]" OR username="[email protected]")
    

    which may be more efficient than returning all the data in the index, then discarding anything that doesn't match the list of users.

    This approach assumes that you have the username field extracted in the first place. If you don't, you can try the following.

    index=proxy123 activity="download" [ | inputlookup username.csv | rename users AS search | format ]
    

    This expanded search will be

    index=proxy123 activity="download" "[email protected]" OR "[email protected]" OR "[email protected]")
    

    which may be more suitable to your data.