Search code examples
sqlpostgresqlgroup-byaggregate-functions

PostgreSQL - group by distinct pairs of values


I have something like a netflow table and would like to group it in such a way that it groups it by (src_ip, src_port, dst_ip, dst_port) where the values could be swapped between the src and dst fields.

src_ip src_port dst_ip dst_port bytes_sent
192.168.1.1 123 192.168.10.5 321 111
192.168.10.5 321 192.168.1.1 123 222
10.0.0.5 50 172.0.0.5 55 500
172.0.0.5 55 10.0.0.5 50 300
192.168.1.1 123 192.168.10.5 321 1000
192.168.1.1 123 192.168.10.5 20 999

I would like to have the following result from this table:

src_ip src_port dst_ip dst_port bytes_sent bytes_recv
192.168.1.1 123 192.168.10.5 321 1111 222
10.0.0.5 50 172.0.0.5 55 500 300
192.168.1.1 123 192.168.10.5 20 999 0

Basically, trying to capture the traffic going both ways in a single row. So something like grouping by (src_ip, src_port) and (dst_ip, dst_port) where those values could be inverted. What would be the best way to achieve this?


Solution

  • In order to decide which IP, port and direction you must have a rule who you consider sender and who receiver in your aggregated result. Let's make the lesser IP the source and the greater IP the destination. Then it's just about the same CASE expression again and again to decide which original column to put in which result column. Once this is done, aggregate your data.

    with 
      data as
      (
        select
          case when src_ip < dst_ip then  src_ip      else  dst_ip      end as source_ip,
          case when src_ip < dst_ip then  dst_ip      else  src_ip      end as dest_ip,
          case when src_ip < dst_ip then  src_port    else  dst_port    end as source_port,
          case when src_ip < dst_ip then  dst_port    else  src_port    end as dest_port,
          case when src_ip < dst_ip then  bytes_sent  else  0           end as sent,
          case when src_ip < dst_ip then  0           else  bytes_sent  end as received
        from mytable
      )
    select
      source_ip, source_port, dest_ip, dest_port,
      sum(sent) as bytes_sent,
      sum(received) as bytes_received
    from data
    group by source_ip, source_port, dest_ip, dest_port
    order by source_ip, source_port, dest_ip, dest_port;