Search code examples
duplicatesstata

Stata Duplicates within a 5 minute trange


For the following dataset example:

 11-12-2014 21:59
 11-12-2014 21:59
 11-12-2014 22:00
 11-12-2014 22:06

I need to regard observations that are less than five minutes apart as duplicates and use them in a "bysort" command afterwards. Does anyone know how I can define duplicates to be observations that are <5 minutes apart?


Solution

  • This is an incomplete answer, since for clarity I used simple numbers rather than Stata time values. But it shows the fundamental idea.

    clear
    input float x
     1
     3
     9
    13
    17
    end
    generate run = 0
    replace run = x in 1
    replace run = cond(x<=run[_n-1]+5,run[_n-1],x) if _n>1
    

    which gives the following result, showing that the variable run identifies sets of "duplicate" observations by your criterion.

    . list
    
         +----------+
         |  x   run |
         |----------|
      1. |  1     1 |
      2. |  3     1 |
      3. |  9     9 |
      4. | 13     9 |
      5. | 17    17 |
         +----------+