Search code examples
sqlpostgresqlcountsubquerysql-delete

postgres delete rows having low total count


I have a table having contents similar to this:

user_id |   session_id   |       timestamp        |    lat    |    lon     
---------+----------------+------------------------+-----------+------------
       1 | 20081023025304 | 2008-10-23 02:53:04+01 | 39.984702 | 116.318417
       1 | 20081023025304 | 2008-10-23 02:53:10+01 | 39.984683 |  116.31845
       1 | 20081023025304 | 2008-10-23 02:53:15+01 | 39.984686 | 116.318417
       2 | 1020081023055305 | 2008-10-23 05:53:05+01 | 39.984094 | 116.319236
       2 | 1020081023055305 | 2008-10-23 05:53:06+01 | 39.984198 | 116.319322
       2 | 1020081023055305 | 2008-10-23 05:53:11+01 | 39.984224 | 116.319402
       2 | 1020081023055305 | 2008-10-23 05:53:16+01 | 39.984211 | 116.319389
       2 | 1020081023055305 | 2008-10-23 05:53:21+01 | 39.984217 | 116.319422
       2 | 1020081023055305 | 2008-10-23 05:53:23+01 |  39.98471 | 116.319865
       2 | 1020081023055305 | 2008-10-23 05:53:28+01 | 39.984674 |  116.31981
       3 | 2020081023124523 | 2008-10-23 12:45:23+01 | 39.927938 | 116.338967
       3 | 2020081023124523 | 2008-10-23 12:45:24+01 | 39.927527 | 116.338899

It appears trips with few instances, which are of no value to my analysis are influencing my statistical computations (outliers). I would like to delete all trips with few rows.

I therefore want to delete trips with low sampling count (say trips with row count <5 in the case).


Solution

  • You can use the delete ... using syntax:

    delete from mytable t
    using (
        select session_id
        from mytable 
        group by session_id
        having count(*) < 5
    ) t1
    where t1.session_id = t.session_id