Previous observations

I need your help with the following problem. My data is

ack            det                     uniquenr

02mar2015 10:30:19  27feb2015 17:03:25 1

09mar2015 11:56:44  07mar2015 22:05:52 2

16mar2015 09:59:51  09mar2015 11:08:36 2

16mar2015 09:59:51  09mar2015 11:08:36 2

16mar2015 09:59:51  09mar2015 11:08:36 2

16mar2015 09:59:51  09mar2015 11:08:36 2

24feb2015 15:03:00  21feb2015 10:23:00 3

ack and det are datetimes. det must be matched to the earliest possible following ack with the same unique number. As appears in the example above, det of 09mar2015 is wrong. I would like the result to be:

ack            det                     uniquenr

02mar2015 10:30:19  27feb2015 17:03:25 1

09mar2015 11:56:44  07mar2015 22:05:52 2

09mar2015 11:56:44  09mar2015 11:08:36 2

09mar2015 11:56:44  09mar2015 11:08:36 2

09mar2015 11:56:44  09mar2015 11:08:36 2

09mar2015 11:56:44  09mar2015 11:08:36 2

24feb2015 15:03:00  21feb2015 10:23:00 3

I have tried the following code:

sort uniquenr det
bysort uniquenr (det): gen v1= (ack[_n-1]-det)/(1000*60*60)
gen v2= (ack-det)/(1000*60*60)
gen  bedrenabo=1 if v1<v2
order bedrenabo v1 v2
replace bedrenabo=0 if v1==v2
replace bedrenabo=0 if v1<0
replace bedrenabo=0 if v1>v2
gen double newack1=ack[_n-1] if bedrenabo==1
format newack %tc
order Handling newack1 ack det tra
sort uniquenr det
replace ack=newack1 if newack1!=.

and my result is

    ack            det                     uniquenr newack 

02mar2015 10:30:19  27feb2015 17:03:25 1

09mar2015 11:56:44  07mar2015 22:05:52 2

16mar2015 09:59:51  09mar2015 11:08:36 2 09mar2015 11:56:44

16mar2015 09:59:51  09mar2015 11:08:36 2

16mar2015 09:59:51  09mar2015 11:08:36 2

16mar2015 09:59:51  09mar2015 11:08:36 2

24feb2015 15:03:00  21feb2015 10:23:00 3

The problem seems to arise because there are duplicates in the data and I use [_n-1]. The optimal solution would be to use a replace command which for all observations with the same det and uniquenr replaces all missing values with the known newack - if there is a newack. The command bysort uniquenr det : replace nyack1 = newack1[_n-1] if missing(newack1) gives different amounts of change for each time.

I am not yet familiar with foreach loops, but if that's the way, I am open to it.

Solution

Ok, ok, assuming that I understand the question, here is one possible solution.

Please note, that I've modified you original example to make it more general (otherwise a simple egen will do, but it seems to me that is not what you want).

// Input the data
// @Msh, generally it is helpful if you provide somthing like this with your Q.
clear
input str18 ackstr str18 detstr uniquenr
"02mar2015 10:30:19" "27feb2015 17:03:25" 1
"09mar2015 11:56:44" "07mar2015 22:05:52" 2
"16mar2015 09:59:51" "09mar2015 11:08:36" 2
"16mar2015 09:59:51" "09mar2015 11:08:36" 2
"16mar2015 09:59:51" "09mar2015 11:08:36" 2
"16mar2015 09:59:51" "09mar2015 11:08:36" 2
"24feb2015 15:03:00" "21feb2015 10:23:00" 3
end

gen double ack = clock(ackstr, "DMYhms")
gen double det = clock(detstr, "DMYhms")
format ack det %tc
drop *str

// I will modify the original data to make it more general
replace ack = tc(08mar2015 00:00:00) in 5

// Add row identifiers
gen id = _n

// Create all possible combinations of ack and det withing the same uniquenr
preserve
    keep uniquenr ack
    rename ack new=
    tempfile ack
    save `ack'
restore
joinby uniquenr using `ack'

// Drop ack that are smaller than det
drop if newack < det

// Match the smallest ack to a given det
ds newack id, not
collapse (min) newack (last) `r(varlist)', by(id)

You can also do it with a loop, e.g. in this manner:

// Add row numbers
gen id = _n

// Prepare newack
gen double newack = .
format newack %tc

// Ensure the data is sorted
sort uniquenr ack

// Compute block boundaries
gen row = _n
by uniquenr: gen min = row[1]
by uniquenr: gen max = row[_N]

// Compute new ack
forvalues i = 1/`c(N)' {
    forvalues j = `=min[`i']'/`=max[`i']' {
        if `=ack[`j']' >= `=det[`i']' {
            replace newack = ack[`j'] in `i'
            continue, break
        }
    }
}

// Revert the sorting
sort id

If you have a large dataset, it might make sense to rewrite the loop in mata, which should be faster.