Search code examples
sqllogparser

Log parser 2.2 Query to count Unique user


I've been trying to build a query over a custom log of mine where I sort the users based on certain criteria to have some overview of them.

My log contains a entry for each time a user tries to download a file, that entry contains date, ip, a custom generated token and how many times that user has tried.

The token is stored by SESSION and a token is only valid for 5 attempts of downloading, So that means that one ip can have multiple users(with different tokens) that each have different amount of attempts.

What I want to achieve is rather simple, I want to group the users by ip, and then count their amount of attempts, and then find out how many users there are.

The amount is not counted per IP but rather per token meaning a log entry may look like this:

IP                 TOKEN      ATTEMPT
111.111.111.111    DK1234     a1
111.111.111.111    DK9876     a1
111.111.111.111    DK9876     a2
222.222.222.222    DK5432     a1

Below is my latest attempts of trying to achieve this, but while I try to make the logic behind it work it just isn't what I want.

(The fields involved are: Ip, Token and Attempt (The attempt value looking like this: a1, a2, a3 and so on for each attempt the user makes).)

SELECT 
    Ip,
    CASE TO_INT(replace_chr(Attempt, 'a', '')) 
        WHEN 1 
        THEN 
            'MUL' 
        ELSE 
            'ONE' 
    END 
    AS Users,
    SUM(TO_INT(replace_chr(Attempt, 'a', ''))) AS Attempts
FROM 
    --LOG PATH
WHERE 
        Status = 'SUCCESS' 
    and 
        TO_DATE(TO_TIMESTAMP(LDate, 'dd/MM/yyyy-hh:mm:ss')) > SUB( TO_LOCALTIME(SYSTEM_TIMESTAMP()), TIMESTAMP('8','d') )
GROUP BY
    Ip,
    Users

If I could somehow store a value to increase for each unique Token per IP and store it with the results, but I cannot / do not know a way to achieve this either.

Using DISTINCT won't work either because when I do I get a error saying that DISTINCT cannot work with GROUP BY and my SUM() / Possible COUNT() won't work when Ip isn't in a GROUP BY

(The snippet below is what I have tried with DISTINCT / count)

SELECT 
    Ip,
    COUNT(DISTINCT Token),
    SUM(TO_INT(replace_chr(Attempt, 'a', ''))) AS Attempts
FROM 
    --Log Path
WHERE 
        Status = 'SUCCESS' 
    and 
        TO_DATE(TO_TIMESTAMP(LDate, 'dd/MM/yyyy-hh:mm:ss')) > SUB( TO_LOCALTIME(SYSTEM_TIMESTAMP()), TIMESTAMP('8','d') )
GROUP BY
    Ip

How I'd like my result grid to end up: (Without the explanation text of course)

IP                 Users     Attempts
123.456.789.012     4           4 (4 users each trying one time)
120.987.654.321     2           5 (2 users, One user tried once and the other user tried 4 times)
444.444.444.444     1           1 (One user, one attempt)

I hope I'm making sense, otherwise I'll be happy to elaborate / explain anything needed :)


Solution

  • I believe you need two stages. The first stage collapses the entries per-user:

    SELECT 
        Ip,
        Token,
        MAX(TO_INT(replace_chr(Attempt, 'a', ''))) AS Attempts
    FROM 
        ...
    GROUP BY
        Ip,
        Token
    

    The second stage then rolls up by Ip:

    SELECT 
        Ip,
        COUNT(*) AS Users,
        SUM(Attempts) As TotalAttempts
    FROM 
        ...
    GROUP BY
        Ip