Search code examples
sqlsqlitejoinlimit-per-group

SQL : max occurence(s) for each value


I have a very simple table (LOG) containing attributes MAC_ADDR, IP_SRC, IP_DST, URL, PROTOCOL. I want the first n lines containing IP_SRC, URL, #OfOccurrences ordered by decreasing #OfOccurrences for each IP_SRC in my table when PROTOCOL='DNS'.

To be clearer, I want to be able to list the first n most visited pages for each IP_SRC in my table.

I can get the most visited URL for each IP_SRC like this :

select ip_src,url,cnt
from (
    select ip_src,url,count(*) as cnt,protocol
    from log as b group by ip_src,url order by ip_src,cnt desc
) as c
where cnt>=(select MAX(cpt)
            from (select count(*) as cpt from log as b
            where c.ip_src==b.ip_src group by ip_src,url)
           )
      and protocol='DNS';

However, this solution is obviously not optimized.

Here is a more practical code (for the most visited URL for each IP_SRC) :

select ip_src,url,cnt
from (select ip_src,url,count(*) as cnt
      from log where protocol='DNS'
      group by ip_src,url
      order by ip_src,cnt asc)
group by ip_src;

This second option is way more faster ! However, I want the n most visited pages for each IP_SRC, and I can't figure out how to do.

Thanks for your help.


Solution

  • Finally, with the use of a temporary table, I could manage to get what I wanted.

    --First create a temp table of occurences
    CREATE TEMPORARY TABLE TEMP1 AS
    SELECT ip_src,url,count(*) AS cnt
    FROM LOG
    WHERE protocol='DNS'
    GROUP BY ip_src,url
    ORDER BY ip_src,cnt,url DESC;
    --Then use a classic limit per group query
    SELECT T1.ip_src,T1.url,T1.cnt
    FROM TEMP1 AS T1
    WHERE T1.url in (
          SELECT T2.url
          FROM TEMP1 AS T2
          WHERE T2.ip_src=T1.ip_src and T2.cnt>=T1.cnt
          ORDER BY T2.cnt DESC
          LIMIT 3 --Or whatever you want it to be
    )
    ORDER BY T1.ip_src ASC,T1.cnt DESC;
    

    If someone knows how to do the same without the need of a temp table (or explain me why a temp table is a good solution), please express yourself.