Search code examples
splunksplunk-query

Splunk left jion is not giving as exepcted


Requirement: I want to find out, payment card information used in a particular day are there any tele sales order placed with the same payment card information.

I tried with below query it is supposed to give me all the payment card information from online orders and matching payment info from telesales. But i am not giving correct results basically results shows there are no telesales for payment information, but when i search splunk i am finding telesales as well. So the query wrong.

index="orders" "Online order received" earliest=-9d latest=-8d 
    | rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))" 
    | rename timestamp as  onlineOrderTime 
    | table payHash, onlineOrderTime 
    | join type=left payHash [search index="orders"  "Telesale order received" earliest=-20d latest=-5m | rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))" | rename timestamp as TeleSaleTime | table payHash, TeleSaleTime] 
    | table payHash, onlineOrderTime, TeleSaleTime

Please help me in fixing the query or a query to find out results for my requirement.


Solution

  • If you do want to do this with a join, what you had, slightly changed, should be correct:

    index="orders" "Online order received" earliest=-9d latest=-8d 
    | rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))" 
    | stats values(_time) as onlineOrderTime by payHash
    | join type=left payHash 
        [search index="orders" "Telesale order received" earliest=-20d latest=-5m
        | rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))" 
        | rename timestamp as TeleSaleTime 
        | stats values(TeleSaleTime) by payHash ]
    | rename timestamp as onlineOrderTime 
    

    Note the added | stats values(...) by in the subsearch: you need to ensure you've removed any duplicates from the list, which this will do. By using values(), you'll also ensure if there're repeated entries for the payHash field, they get grouped together. (Similarly, added a | stats values... before the subsearch to speed the whole operation.)


    You should be able to do this without a join, too:

    index="orders" (("Online order received" earliest=-9d latest=-8d) OR "Telesale order received" earliest=-20d)) 
    | rex field=_raw "(?<order_type>\w+) order received"
    | rex field=message "paymentHashed=(?<payHash>.([a-z0-9_\.-]+))"
    | stats values(order_type) as order_type values(_time) as orderTimes by payHash
    | where mvcount(order_type)>1    
    

    After you've ensured your times are correct, you can format them - here's one I use frequently:

    | eval onlineOrderTime=strftime(onlineOrderTime,"%c"), TeleSaleTime=strftime(TeleSaleTime,"%c")
    

    You may also need to do further reformatting, but these should get you close


    fwiw - I'd wonder why you were trying to look at Online orders from only 9 days ago, but Telesale orders from 20 days ago to now: but that's just me.