Search code examples
sqloracle-databasedatediffdate-difference

Oracle sql: filtering repeated rows that only differ by a tiny amount of time


I have an Oracle table with event alarms, and due to a weird and unknown -to me- condition alarms get repeated sometimes so I was asked to create a service that will remove repeated alarms from that Oracle table.

The condition to consider an alarm (a row in the table) as repeated is that there is another one with the same exact PKN_EVENTNAME and RECEIVEDDATE only differs from the previous one by a tiny amount of time (let's say 10 seconds for example -up or down-).

What I'm trying to do first is to create an Oracle sql statement that will group all alarms by PKN_EVENTNAME separating in each group the ones that are repeated (for later deletion).

I think I'm on the way, but I'm stuck. What can I try next?

My SQL so far:

select t1.ID, t1.PKN_EVENTNAME, t1.RECEIVEDDATE 
from PARQUIMETERS_ALARMS t1 
where 
  exists
     (select 'x' 
      from   PARQUIMETERS_ALARMS t2 
      where  t1.id <> t2.id and                                              -- Not the same row
             trunc(t2.RECEIVEDDATE) = trunc(t1.RECEIVEDDATE)                 -- Same date
             and abs(t1.RECEIVEDDATE - t2.RECEIVEDDATE) * 24 * 60 * 60 < 10)  -- < 10 sec

enter image description here

Edit

With @Tejash corrections I see different results in Visual Studio Oracle SQL browser, but I'm not being able to understand them. I don't see clear if results are already records to be deleted (repeated alarms) or what.


Solution

  • You can utilize range clause of analytic functions:

    with dups as (
      select t1.*
           , row_number() over (
               partition by PKN_EVENTNAME, RECEIVEDDATE
               order by id
             ) as dup
      from PARQUIMETERS_ALARMS t1
    ), nodups as (
      select * from dups where dup = 1
    ), t as (
      select nodups.ID, nodups.PKN_EVENTNAME, nodups.RECEIVEDDATE
           , count(*) over (
               partition by nodups.PKN_EVENTNAME
               order by nodups.RECEIVEDDATE
               range between interval '10' second preceding and current row
             ) as cnt
      from nodups
    )
    select * from t where cnt = 1
    

    (UPDATED: CTEs dups and nodups were added after OP shown in comment there are duplicate tuples (PKN_EVENTNAME, RECEIVEDDATE).)

    Explanation: after cleaning data passing through nodups CTE, the where condition filters only rows for which there is only single row in last 10 s (which is obviously the current row).